The digital support of the Hungarian language in support of Hungarian science

Authors

  • Gábor Prószéky HUN-REN Nyelvtudományi Kutatóközpont
  • Tamás Váradi HUN -REN Nyelvtudományi Kutatóközpont
  • András Holl MTA Könyvtár és Információs Központ

DOI:

https://doi.org/10.18349/MagyarNyelv.2023.4.478

Keywords:

repositories, text corpora, automated annotation

Abstract

The Repository of the Library and Information Centre of the Hungarian Academy of Sciences (REAL) is an important secondary (archived) source of scientific literature in Hungarian. While in the past this collection served individual researcher's document needs in accordance with traditional library functionality, here the text layers of documents are treated as a corpus of text. Linguistic tools are used to explore and mine the corpus in a broad sense, including the extraction of references to literature and recognition of various named entities. The project will improve the quality of the text by identification of possible textual errors and enrich the metadata of the documents. The objective of the project is to improve both the repository services and data quality, enabling the development of value-added services for the research community.

Downloads

Published

2023-12-20

Issue

Section

Különfélék