TY - JOUR
T1 - Sistema de Aprendizaje Automático para la Detección y Análisis de Contenido Sexista en la Música Urbana
AU - AÑAPA, DANY DARIO PIANCHICHE
AU - Pico-Valencia, Pablo
AU - Holgado-Terriza, Juan Antonio
PY - 2024/6/30
Y1 - 2024/6/30
N2 - This paper presents aspects related to the creation of an automatic classifier designed to evaluate and categorize the level of sexism present in the lyrics of songs of the urban music genre. The classification system assigns lyrics to three different categories: "A", indicating content suitable for audiences of all ages; "B", indicating content requiring adult supervision; and "C", representing adult-oriented material. The classifier was implemented in Python by applying the following algorithms: Naïve Bayes, nearest neighbours, decision tree, support vector machine and logistic regression. For the model training process, a dataset composed of 479 observations was created, divided into 75% for training and 25% for testing. The training dataset included both expressions with sexist connotations and those without. The classifier that achieved the highest degree of accuracy was the model based on the logistic regression algorithm with 77% accuracy. In order to facilitate the exploitation of the classifier in production environments, the model was integrated with a graphical user interface that facilitates the usability of the system for potential beneficiaries.
AB - This paper presents aspects related to the creation of an automatic classifier designed to evaluate and categorize the level of sexism present in the lyrics of songs of the urban music genre. The classification system assigns lyrics to three different categories: "A", indicating content suitable for audiences of all ages; "B", indicating content requiring adult supervision; and "C", representing adult-oriented material. The classifier was implemented in Python by applying the following algorithms: Naïve Bayes, nearest neighbours, decision tree, support vector machine and logistic regression. For the model training process, a dataset composed of 479 observations was created, divided into 75% for training and 25% for testing. The training dataset included both expressions with sexist connotations and those without. The classifier that achieved the highest degree of accuracy was the model based on the logistic regression algorithm with 77% accuracy. In order to facilitate the exploitation of the classifier in production environments, the model was integrated with a graphical user interface that facilitates the usability of the system for potential beneficiaries.
UR - http://dx.doi.org/10.37815/rte.v36n1.1088
U2 - 10.37815/rte.v36n1.1088
DO - 10.37815/rte.v36n1.1088
M3 - Artículo
SN - 1390-3659
JO - REVISTA TECNOLOGICA
JF - REVISTA TECNOLOGICA
ER -