Sistema para categorización automática de texto basado en técnicas de aprendizaje de máquina, procesamiento de lenguaje natural y minería de opiniones
Fecha
Autores
Autor corporativo
Título de la revista
ISSN de la revista
Título del volumen
Editor
Compartir
Director
Altmetric
Resumen
Nowadays, the available information on a wide range of topics has increased, due to the exponential use by people of Web 2.0 tools such as opinion forums and social networks, mainly; this has generated a high volume of comments on various topics of concern, but analyzes are not fulfilled on this information, misusing its immense potential to help people and organizations in decision-making processes. This document describes the research, development and implementation of a system that allows determining the polarity of an unstructured opinion text and classifying it as positive or negative, considering feelings’ factors, emotions and attitudes expressed in the said opinion. Based on the use of Natural Language Processing tools, semantic structure analysis, opinion lexicons and guided by the KDD process model, combinations of attributes were studied and evaluated in the given data context, resulting in a reliable attribute model that the classification algorithms used required. The system implemented two (2) continuous supervised classification algorithms: NaiveBayes and MaxEntropy, with which the necessary modification was made to be used in the text analysis, and with which the probability of each class (positive, negative) is estimated. under a weighting scheme and this class is determined in the evaluation and classification of a given text. To establish the validity of the model and the implemented algorithms, metrics are used that evaluate the behavior of each algorithm against the model and the established data set.