A Dataset for Analysis of Quality Code and Toxic Comments

Jaime Sayago-Heredia, Gustavo Chango Sailema, Ricardo Pérez-Castillo, Mario Piattini

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Software development has an important human aspect, so it is known that the feelings of developers have a significant impact on software development and could affect the quality, productivity and performance of developers. In this study, we have begun the process of finding, understanding and relating these affects to software quality. We propose a quality code and sentiments dataset, a clean set of commits, code quality and toxic sentiments of 19 projects obtained from GitHub. The dataset extracts messages from the commits present in GitHub along with quality metrics from SonarQube. Using this information, we run machine learning techniques with the ML.Net tool to identify toxic developer sentiments in commits that could affect code quality. We analyzed 218K commits from the 19 selected projects. The analysis of the projects took 120 days. We also describe the process of building the tool and retrieving the data. The dataset will be used to further investigate in depth the factors that affect developers’ emotions and whether these factors are related to code quality in the life cycle of a software project. In addition, code quality will be estimated as a function of developer sentiments.

Original languageEnglish
Title of host publicationApplied Technologies - 4th International Conference, ICAT 2022, Revised Selected Papers
EditorsMiguel Botto-Tobar, Marcelo Zambrano Vizuete, Sergio Montes León, Pablo Torres-Carrión, Benjamin Durakovic
PublisherSpringer Science and Business Media Deutschland GmbH
Pages559-574
Number of pages16
ISBN (Print)9783031249846
DOIs
StatePublished - 2023
Event4th International Conference on Applied Technologies, ICAT 2022 - Quito, Ecuador
Duration: 23 Nov 202225 Nov 2022

Publication series

NameCommunications in Computer and Information Science
Volume1755 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference4th International Conference on Applied Technologies, ICAT 2022
Country/TerritoryEcuador
CityQuito
Period23/11/2225/11/22

Bibliographical note

Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Keywords

  • Commits
  • GitHub
  • Sentiments analysis
  • Software Engineering
  • Software quality
  • SonarQube
  • Toxic comment classification

Fingerprint

Dive into the research topics of 'A Dataset for Analysis of Quality Code and Toxic Comments'. Together they form a unique fingerprint.

Cite this