000 02631naa a2200265 a 4500
003 AR-LpUFIB
005 20250311171158.0
008 230201s2022 xx o 000 0 eng d
024 8 _aDIF-M8281
_b8501
_zDIF007573
040 _aAR-LpUFIB
_bspa
_cAR-LpUFIB
100 1 _aTessore, Juan Pablo
245 1 0 _aDistant supervised construction and evaluation of a novel dataset of emotion-tagged social media comments in spanish
500 _aFormato de archivo PDF. -- Este documento es producción intelectual de la Facultad de Informática - UNLP (Colección BIPA/Biblioteca)
520 _aTagged language resources are an essential requirement for developing machine-learning text-based classifiers. However, manual tagging is extremely time consuming and the resulting datasets are rather small, containing only a few thousand samples. Basic emotion datasets are particularly difficult to classify manually because categorization is prone to subjectiv- ity, and thus, redundant classification is required to validate the assigned tag. Even though, in recent years, the amount of emotion-tagged text datasets in Spanish has been growing, it cannot be compared with the number, size, and quality of the datasets in English. Quality is a particularly concerning issue, as not many datasets in Spanish included a validation step in the construction process. In this article, a dataset of social media comments in Spanish is compiled, selected, filtered, and presented. A sample of the dataset is reclassified by a group of psychologists and validated using the Fleiss Kappa interrater agreement measure. Error analysis is performed by using the Sentic Computing tool BabelSenticNet. Results indicate that the agreement between the human raters and the automatically acquired tag is moderate, similar to other manually tagged datasets, with the advantages that the presented dataset contains several hundreds of thousands of tagged comments and it does not require extensive manual tagging. The agreement measured between human raters is very similar to the one between human raters and the original tag. Every measure presented is in the moderate agreement zone and, as such, suitable for training classification algorithms in sentiment analysis field.
534 _aCognitive Computation, 2022, 14(2).
650 4 _aREDES SOCIALES
_94685
650 4 _aTIPOS DE DATOS
653 _aminería de texto
653 _adatos de emociones
700 1 _aEsnaola, Leonardo Martín
700 1 _aLanzarini, Laura Cristina
700 1 _aBaldassarri, Sandra
856 4 0 _uhttp://dx.doi.org/10.1007/s12559-020-09800-x
942 _cCP
999 _c57346
_d57346