Maestría en Ciencias de la Información y las Comunicaciones Metodología Investigación

URI permanente para esta colecciónhttp://hdl.handle.net/11349/96016

Examinar

Envíos recientes

Mostrando 1 - 4 de 4
  • Ítem
    Identificación de emociones relacionadas al espacio geográfico a partir de datos de redes sociales y procesamiento de lenguaje natural
    (Universidad Distrital Francisco José de Caldas) Oviedo Yate, Brayan Stiven; Rocha Salamanca, Luz Angela; Bonilla Huerfano, Johnatan Estiven; Rocha Salamanca, Luz Angela [0000-0001-5274-4819]
    The aim of this study is to explore the spatial distribution of emotions associated with geographic space in Colombia as expressed by Twitter (X) users. The research integrates Natural Language Processing (NLP) techniques—specifically Named Entity Recognition (NER) and Emotion Analysis (EA)—that have been adapted to Colombian Spanish, a variety characterized by strong regional and dialectal diversity (Mora et al., 2004; Bonilla, 2023). This linguistic heterogeneity poses significant challenges for computational approaches to emotion and place, given the range of expressions used to refer both to locations (e.g., montaña, cerro, filo, peña) and to emotional states (e.g., ativo, fachoso, acoquinado). To address these challenges, the study proposes a theoretical–methodological workflow for identifying emotions in geographically referenced tweets based on both their content and the places they mention. Two annotated corpora were developed: (1) a 2,000-sentence location corpus that includes place names, nicknames, and common spatial references in Colombia, used to fine-tune a NER model for place detection; and (2) a second corpus of emotion-labeled tweets linked to geographic entities extracted by the NER model, used to fine-tune a BERT-based emotion classifier. The fine-tuned language models were applied to a large Twitter dataset compiled by Jiménez et al. (2018) and Rodríguez-Díaz et al. (2018), producing a georeferenced database of approximately 3.8 million tweets classified by emotion. These results were integrated into an interactive web map for visualization, and further analyzed using spatial correlation metrics such as Moran’s I and Kernel density estimations. After fine-tuning, the NER model improved from 44% to over 90% accuracy, while the emotion classifier rose from 41.72% to 72.66%. The spatial autocorrelation results show a moderate positive relationship (Moran’s I > 0.1), suggesting that the spatial distribution of emotions in Colombia is not random. The findings provide valuable resources for researchers in geographic and linguistic studies, as well as for urban planners and decision-makers seeking rapid access to subjective, emotion-based insights about Colombian cities derived from social media data.
  • Ítem
    Generación de un modelo geoespacial para la gestión hídrica y prevención de inundaciones en el Departamento de Casanare.
    (Universidad Distrital Francisco José de Caldas) Sánchez Montaña, Vivian Daniela; Rocha Salamanca, Luz Ángela; Rocha Salamanca, Luz Ángela [0000-0001-5274-4819]
    The application of geospatial tools in water modeling has become established as a key component for analyzing and managing water resources at the regional scale, facilitating the understanding of flooding events and extreme phenomena. Monthly precipitation rasters were generated in the Department of Casanare, Colombia, using geostatistical techniques, including empirical Bayesian kriging and meteorological data from IDEAM (2024). Simultaneously, IGAC soil data was utilized to determine the Curve Number (CN) according to soil type and hydrological group, converting these values into raster layers. Subsequently these data were combined with precipitation data to estimate monthly direct runoff, resulting in the production of 12 maps that describe the spatial distribution of surface runoff. In the last stage, hydrological information with topography were combined to identify areas that were vulnerable to flooding. Consequently, the flow accumulation raster, which was derived from the Digital Terrain Model (DEM), was normalized using a logarithmic transformation to account for dispersion and simplify spatial analysis. This procedure made it possible to combine the monthly runoff maps and generate monthly flood susceptibility products, highlighting the areas with the regions with the highest surface accumulation under different climate scenarios. The suggested model provides trustworthy data that enhances comprehension of regional dynamics and encourages an emphasis on sustainable development, making it a crucial tool for water management and the avoidance of extreme hydrological events in Casanare.
  • Ítem
    Modelo para evaluar el impacto que tienen las variables climatológicas y de calidad del aire, en la eficiencia de los paneles solares
    (Universidad Distrital Francisco José de Caldas.) Carrillo Mejía, Luis; Gaona García, Elvis Eduardo; Mora Hernández, Johann Alexander; Gaona García, Elvis Eduardo [0000-0001-5431-8776]
    In a global context that drives an imperative energy transition, where solar power emerges as a primary alternative due to its ease of installation and distributed generation capabilities, a critical need arises to understand and mitigate the factors affecting its efficiency. This research precisely addresses the impact of climatological and air quality variables on the performance of solar panels, proposing a machine learning model capable of forecasting their efficiency. The central problem lies in the vulnerability of photovoltaic generation to exogenous factors like irradiance, temperature, pressure, humidity, and, crucially, dirt and environmental pollution. Omitting these studies can lead to the installation of seemingly viable systems that, over time, suffer a rapid degradation of efficiency, generating economic losses and disconnection in areas seeking energy autonomy. To counter this, a model is proposed that uses climatological and air pollution data from recognized APIs to study efficiency behavior in a pre-selected location in Colombia. The problem's formulation focused on how to evaluate the impact of these variables, and its systematization explored the functional and structural description of the evaluation and visualization model, as well as the validation of its predictive capacity. To address this challenge, climatological, air quality, and irradiance variables were collected, stored, and processed. Various machine learning algorithms were evaluated to predict instantaneous efficiency in seven geographical locations in Colombia with varied climatological characteristics, selecting the best-performing ones based on the number of correct predictions the algorithm is capable of making (R2). One of the most relevant conclusions is that the inclusion of air pollution variables in the prediction models significantly improved predictive performance, by 3% for the standard evaluation metric used. The feature importance analysis of the selected Random Forest model revealed that irradiance (GHI, GDI, DNI), climatological variables (temperature, humidity, wind direction and speed), and air quality variables (NO2, PM2.5, PM10, CO, and O3) were the most influential. Additionally, a negative association was found between instantaneous efficiency and higher air pollution conditions, with NO2, CO, SO2, PM10 and PM2.5 showing the greatest influence, while O3 exhibited a positive association, regardless of the overall level of pollution. These findings underscore the practical importance of incorporating air quality variables in the planning and operation of photovoltaic systems, since a more accurate prediction of instantaneous efficiency allows for a better estimation of performance and optimizes operation and maintenance strategies, contributing to a more effective use of solar potential and reinforcing the viability of clean energy.
  • Ítem
    Modelo de clasificación automática de texto en idioma indígena Wayuunaiki que incorpora características gramaticales
    (Universidad Distrital Francisco José de Caldas) Salazar Báez , Diana Milena; Rodríguez Rodríguez , Jorge Enrique
    The natural language processing (NLP) techniques applied to automatic text classification operate optimally when performing tasks such as ordering, labeling, and clustering texts written in widely used languages such as English, Chinese, and Spanish, among others. This performance has been achieved thanks to significant advances in machine learning and deep learning architectures, semantic representation strategies for pre-training, and the availability of and access to large volumes of data. In the case of NLP for indigenous community languages, few studies describe the processing of an indigenous language that takes into account both its grammatical features and the cultural identity of its speakers. This gap stems from challenges related to the scarcity of datasets containing an adequate number of records with high data quality; likewise, there are no linguistic resources such as dictionaries, lemmatizers, or taggers that could be adapted from other NLP solutions for grammatical analysis. Against this backdrop, the present work outlines a proposal for an automatic text classification model in the indigenous wayuunaiki language, the native tongue of the Wayuú community inhabiting Colombia and Venezuela. This model is developed using natural language processing (NLP) techniques and the CRISP-DM (Cross-Industry Standard Process for Data Mining) methodology. It fundamentally integrates wayuunaiki’s own grammatical features—prepositions, verb conjugations marked for person and gender, and agglutinative morphology—with the aim of achieving more accurate classification that supports the execution of other NLP tasks. In addition to contributing to computational processes, this work also seeks to provide a high-quality, labeled wayuunaiki text corpus for research that fosters the conservation and teaching of the language.