For the digital sustainability of the Hungarian language

Authors

  • Gábor Prószéky HUN-REN Nyelvtudományi Kutatóközpont
  • Tamás Váradi HUN-REN Nyelvtudományi Kutatóközpont

DOI:

https://doi.org/10.18349/MagyarNyelv.2023.4.482

Keywords:

Hungarian National Corpus, spelling advisory portal, digitization of dictionary cards, Hanti and Mansi text corpora

Abstract

The project follows the founding mission of the Hungarian Academy of Sciences to ensure that Hungarian is given a worthy role in the digital space. International research focuses mainly on English, with less attention paid to smaller languages like Hungarian. (1) The Hungarian National Corpus (MNSz) consists of more than one billion words, and highly used in linguistic research. It is composed of six stylistic layers and five regional language varieties. The corpus is primarily used to support corpus-based and corpus-driven linguistic research on Hungarian, not only in linguistic research but also in many fields of humanities and social sciences. However, new possibilities, such as newer, higher quality language parsers or large databases to produce large language models, have made it necessary to expand and improve the corpus. (2) Spelling control is a key element of the linguistic norm and is becoming increasingly important in the digital space. The Spelling Advisory Portal, supported by the Hungarian Academy of Sciences, meets this demand with state-of-the-art technology, but needs upgrading in terms of software platform, methodology and customer focus. (3) The collection of more than four million dictionary cards belonging to the Great Dictionary of the Hungarian Language was created at the end of the 19th century. The cataloguing and digitization of the dictionary cards is ongoing and, although the construction of the collection started almost twenty years ago, further development is needed to ensure its full digital use. (4) In order to support the digital presence of the related languages of Hungarian, the Hanti and Mansi, there is a need to create an analyzed digital corpus based on written modern texts that provide the opportunity to document and preserve the current state of the Obi-Ugrian languages. In summary, these studies will contribute to the development of Hungarian in the digital space and cover a wide range of linguistic research from normative language to orthography and related languages.

Downloads

Published

2023-12-20

Issue

Section

Különfélék