A Dataset for Analysis of Quality Code and Toxic Comments

Jaime Sayago-Heredia, Gustavo Chango Sailema, Ricardo Pérez-Castillo, Mario Piattini

Producción científica: Capítulo del libro/informe/acta de congresoContribución de conferenciarevisión exhaustiva


Software development has an important human aspect, so it is known that the feelings of developers have a significant impact on software development and could affect the quality, productivity and performance of developers. In this study, we have begun the process of finding, understanding and relating these affects to software quality. We propose a quality code and sentiments dataset, a clean set of commits, code quality and toxic sentiments of 19 projects obtained from GitHub. The dataset extracts messages from the commits present in GitHub along with quality metrics from SonarQube. Using this information, we run machine learning techniques with the ML.Net tool to identify toxic developer sentiments in commits that could affect code quality. We analyzed 218K commits from the 19 selected projects. The analysis of the projects took 120 days. We also describe the process of building the tool and retrieving the data. The dataset will be used to further investigate in depth the factors that affect developers’ emotions and whether these factors are related to code quality in the life cycle of a software project. In addition, code quality will be estimated as a function of developer sentiments.

Idioma originalInglés
Título de la publicación alojadaApplied Technologies - 4th International Conference, ICAT 2022, Revised Selected Papers
EditoresMiguel Botto-Tobar, Marcelo Zambrano Vizuete, Sergio Montes León, Pablo Torres-Carrión, Benjamin Durakovic
EditorialSpringer Science and Business Media Deutschland GmbH
Número de páginas16
ISBN (versión impresa)9783031249846
EstadoPublicada - 2023
Evento4th International Conference on Applied Technologies, ICAT 2022 - Quito, Ecuador
Duración: 23 nov. 202225 nov. 2022

Serie de la publicación

NombreCommunications in Computer and Information Science
Volumen1755 CCIS
ISSN (versión impresa)1865-0929
ISSN (versión digital)1865-0937


Conferencia4th International Conference on Applied Technologies, ICAT 2022

Nota bibliográfica

Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Citar esto