Creating a Human Annotated Emotion Corpus for the Detection of Actor-related Emotions

Authors

DOI:

https://doi.org/10.31400/dh-hun.2022.6.4576

Keywords:

sentiment detection, emotion detection, text classification, BERT, supervised model, human annotation

Abstract

In our study, we present an ongoing research project in which our goal is to create a language model capable of classifying sentiments and specific emotions related to actors (e.g., institutions, persons). The training database of the model is a human-annotated text corpus consisting of ten thousand articles from online newspapers, compiled using statistical sampling methods. In the project, we employ a two-phase annotation design. First, we annotate named entities and common names that function as actors. Second, we annotate sentiments and specific emotions found in the context of the previously marked actors. Such a database of annotated texts can provide excellent input for creating supervised classification models. In this article, we describe the corpus of the project, the characteristics of supervised and unsupervised text classification procedures, and possible methods for sentiment and emotion detection. After that, we present the two-phase annotation methodology used in our research, the problems and challenges that arose during its development, as well as the research decisions that we made to create a model that can  be used as a capable research tool in social sciences.

Downloads

Published

2022-12-31

How to Cite

Knap, Árpád, Tímea Emese Tóth, and Zsófia Rakovics. 2022. “Creating a Human Annotated Emotion Corpus for the Detection of Actor-Related Emotions”. Digitális Bölcsészet / Digital Humanities, no. 6 (December):M:3-M:17. https://doi.org/10.31400/dh-hun.2022.6.4576.

Issue

Section

Digital methods, tools and projects