Introducing the AVOBMAT (Analysis and Visualization of Bibliographic Metadata and Texts) Multilingual Research Tool

Authors

DOI:

https://doi.org/10.31400/dh-hun.2021.4.3530

Keywords:

text and data mining, multilingual digital tool, natural language processing, metadata, semantic enrichment

Abstract

The objective of this paper is to demonstrate the workflow, different analytical functions and features of the multilingual AVOBMAT (Analysis and Visualization of Bibliographic Metadata and Texts) digital tool. This web application enables researchers to critically analyse bibliographic data and texts at scale with the help of data-driven methods and tools supported by artificial intelligence and natural language processing techniques. The unique features of the AVOBMAT toolkit are that (i) it can preprocess, analyse and (semantically) enrich a huge number of texts and metadata in several languages; (ii) the implemented analytical and visualization tools provide interactive close and distant reading of texts and bibliographic data; (iii) it combines bibliographic data and natural language processing research methods in one integrated, interactive and user-friendly web application. In the preprocessing phase, the user can set nine optional parameters such as lemmatization and stopword filtering. Users can create different configurations for the different analyses and visualizations. The metadata enrichment includes the automatic identification of the gender of the authors and automatic language detection. Users can search and filter the uploaded and enriched bibliographic data and preprocessed texts in faceted, advanced and command line modes. Having filtered the uploaded databases and selected the metadata field(s), users can (i) analyze and visualize the bibliographic data chronologically in line and area charts in normalized and aggregated formats; (ii) create an interactive network analysis; (iii) make pie, horizontal and vertical bar charts of the bibliographic data. As for the content analysis, the diachronic analysis of texts is supported by the N-gram viewer. Two types of frequency analyses are implemented: the significant text function shows what differentiates a subset of documents from other texts in the corpus, and the TagSpheres enables users to investigate the context of a word. The close reading is also fostered by the Keyword in Context tool. AVOBMAT has an in-browser Latent Dirichlet Allocation function to calculate and visualize topic models. It semantically enriches the texts and metadata by the use of named entity recognition in 16 languages. The export functions of AVOBMAT facilitate the reproducibility of the results and transparency of the preprocessing and text analysis. It helps users realize the epistemological challenges, limitations and strengths of computational text analysis and visual representation of digital texts and datasets.

Published

2021-12-31

How to Cite

Péter, Róbert, Zsolt Szántó, Vilmos Bilicki, and Gábor Berend. 2021. “Introducing the AVOBMAT (Analysis and Visualization of Bibliographic Metadata and Texts) Multilingual Research Tool”. Digitális Bölcsészet / Digital Humanities, no. 4 (December):M:3-M:28. https://doi.org/10.31400/dh-hun.2021.4.3530.

Issue

Section

Digital methods, tools and projects