Dimensionality Reduction Techniques in Code Quality Evaluation

Santiago Logroño, Wilson Chango*, Ana Salguero, Nestor Estrada

*Autor correspondiente de este trabajo

Producción científica: Capítulo del libro/informe/acta de congresoContribución de conferenciarevisión exhaustiva

Resumen

This research aims primarily to identify the optimal features for conducting a precise data analysis to determine which variables are related to code quality and how they influence toxicity levels. A comprehensive comparison of various machine learning algorithms is carried out with the purpose of reducing complexity and addressing potential issues associated with predictive models. The algorithms selected for this study include PCA (Principal Component Analysis), IPCA (Incremental Principal Component Analysis), and KPCA (Kernel Principal Component Analysis), each with its respective applications and advantages. The central question of this research revolves around identifying the features that influence the employed machine learning models. The objective is to avoid having an excess of features, as this not only increases computational complexity but can also lead to missing values that affect the predictive capacity of the models. To successfully address this issue, dimensionality reduction techniques are applied using the PCA, IPCA, and KPCA algorithms. These algorithms play a crucial role in reducing the number of features while simultaneously preserving relevant information within the dataset. It has been possible to determine which features are the most influential, enabling model simplification and reducing associated computational costs. The application of dimensionality reduction techniques in this research has multiple implications. Firstly, the most influential features were identified, allowing for model simplification and cost reduction. Additionally, several kernels of the KPCA algorithm were evaluated to determine which one provides the most optimal results in this context. PCA and KPCA Linear are solid starting options, especially when working with normalized or discretized data. However, it is crucial to consider the performance of IPCA and other algorithms, such as KPCA RBF, based on the requirements and peculiarities of the data. Furthermore, the results underscore the fundamental role of data normalization in improving outcomes in all cases, emphasizing the importance of data preparation in this type of analysis.

Idioma originalInglés
Título de la publicación alojadaProceedings of the International Conference on Computer Science, Electronics and Industrial Engineering (CSEI 2023) - Advances in Computer Sciences - Exploring Innovations at the Intersection of Computing Technologies
EditoresMarcelo V. Garcia, Carlos Gordón-Gallegos, Asier Salazar-Ramírez, Carlos Nuñez
EditorialSpringer Science and Business Media Deutschland GmbH
Páginas257-271
Número de páginas15
ISBN (versión impresa)9783031692277
DOI
EstadoPublicada - 2024
Publicado de forma externa
EventoInternational Conference on Computer Science, Electronics and Industrial Engineering, CSEI 2023 - Ambato, Ecuador
Duración: 6 nov. 202310 nov. 2023

Serie de la publicación

NombreLecture Notes in Networks and Systems
Volumen775 LNNS
ISSN (versión impresa)2367-3370
ISSN (versión digital)2367-3389

Conferencia

ConferenciaInternational Conference on Computer Science, Electronics and Industrial Engineering, CSEI 2023
País/TerritorioEcuador
CiudadAmbato
Período6/11/2310/11/23

Nota bibliográfica

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Huella

Profundice en los temas de investigación de 'Dimensionality Reduction Techniques in Code Quality Evaluation'. En conjunto forman una huella única.

Citar esto