The digital support of the Hungarian language in support of Hungarian science
DOI:
https://doi.org/10.18349/MagyarNyelv.2023.4.478Keywords:
repositories, text corpora, automated annotationAbstract
The Repository of the Library and Information Centre of the Hungarian Academy of Sciences (REAL) is an important secondary (archived) source of scientific literature in Hungarian. While in the past this collection served individual researcher's document needs in accordance with traditional library functionality, here the text layers of documents are treated as a corpus of text. Linguistic tools are used to explore and mine the corpus in a broad sense, including the extraction of references to literature and recognition of various named entities. The project will improve the quality of the text by identification of possible textual errors and enrich the metadata of the documents. The objective of the project is to improve both the repository services and data quality, enabling the development of value-added services for the research community.
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Gábor Prószéky, Tamás Váradi, András Holl
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Magyar Nyelv is a Diamond Open Access periodical. Documents can be freely downloaded and duplicated in an electronic format, and can be used unchanged and with due reference to the original source. Such use must not serve commercial purposes. In the case of any form of dissemination and use, Hungarian Copyright Act LXXVI/1999 and related laws are to be observed. The electronic version of the journal is subject to the regulations of CC BY-NC-ND (Creative Commons – Attribution-NonCommercial-NoDerivatives).
The journal permits its authors, at no cost and without any temporal limitation, to make pre-print copies of their manuscripts publicly available via email or in their own homepage or that of their institution, or in either closed or free-for-all repositories of their institutions/universities, or other non-profit websites, in the form accepted by the journal editor for publication and even containing amendments on the basis of reviewers’ comments. When the authors publicize their papers in this manner, they have to warn their readers that the manuscript at hand is not the final published version of the work. Once the paper has been published in a printed or online form, the authors are allowed (and advised) to use that (post-print) version for the above purposes. In that case, they have to indicate the exact location and other data of the journal publication. The authors retain the copyright of their papers; however, in the case of an occasional secondary publication, the bibliographical data of the first publication have to be included.