Identificación de emociones relacionadas al espacio geográfico a partir de datos de redes sociales y procesamiento de lenguaje natural
Fecha
Autores
Autor corporativo
Título de la revista
ISSN de la revista
Título del volumen
Editor
Compartir
Altmetric
Resumen
The aim of this study is to explore the spatial distribution of emotions associated with geographic space in Colombia as expressed by Twitter (X) users. The research integrates Natural Language Processing (NLP) techniques—specifically Named Entity Recognition (NER) and Emotion Analysis (EA)—that have been adapted to Colombian Spanish, a variety characterized by strong regional and dialectal diversity (Mora et al., 2004; Bonilla, 2023). This linguistic heterogeneity poses significant challenges for computational approaches to emotion and place, given the range of expressions used to refer both to locations (e.g., montaña, cerro, filo, peña) and to emotional states (e.g., ativo, fachoso, acoquinado).
To address these challenges, the study proposes a theoretical–methodological workflow for identifying emotions in geographically referenced tweets based on both their content and the places they mention. Two annotated corpora were developed: (1) a 2,000-sentence location corpus that includes place names, nicknames, and common spatial references in Colombia, used to fine-tune a NER model for place detection; and (2) a second corpus of emotion-labeled tweets linked to geographic entities extracted by the NER model, used to fine-tune a BERT-based emotion classifier. The fine-tuned language models were applied to a large Twitter dataset compiled by Jiménez et al. (2018) and Rodríguez-Díaz et al. (2018), producing a georeferenced database of approximately 3.8 million tweets classified by emotion. These results were integrated into an interactive web map for visualization, and further analyzed using spatial correlation metrics such as Moran’s I and Kernel density estimations. After fine-tuning, the NER model improved from 44% to over 90% accuracy, while the emotion classifier rose from 41.72% to 72.66%. The spatial autocorrelation results show a moderate positive relationship (Moran’s I > 0.1), suggesting that the spatial distribution of emotions in Colombia is not random. The findings provide valuable resources for researchers in geographic and linguistic studies, as well as for urban planners and decision-makers seeking rapid access to subjective, emotion-based insights about Colombian cities derived from social media data.
