Resumen
This research aims primarily to identify the optimal features for conducting a precise data analysis to determine which variables are related to code quality and how they influence toxicity levels. A comprehensive comparison of various machine learning algorithms is carried out with the purpose of reducing complexity and addressing potential issues associated with predictive models. The algorithms selected for this study include PCA (Principal Component Analysis), IPCA (Incremental Principal Component Analysis), and KPCA (Kernel Principal Component Analysis), each with its respective applications and advantages. The central question of this research revolves around identifying the features that influence the employed machine learning models. The objective is to avoid having an excess of features, as this not only increases computational complexity but can also lead to missing values that affect the predictive capacity of the models. To successfully address this issue, dimensionality reduction techniques are applied using the PCA, IPCA, and KPCA algorithms. These algorithms play a crucial role in reducing the number of features while simultaneously preserving relevant information within the dataset. It has been possible to determine which features are the most influential, enabling model simplification and reducing associated computational costs. The application of dimensionality reduction techniques in this research has multiple implications. Firstly, the most influential features were identified, allowing for model simplification and cost reduction. Additionally, several kernels of the KPCA algorithm were evaluated to determine which one provides the most optimal results in this context. PCA and KPCA Linear are solid starting options, especially when working with normalized or discretized data. However, it is crucial to consider the performance of IPCA and other algorithms, such as KPCA RBF, based on the requirements and peculiarities of the data. Furthermore, the results underscore the fundamental role of data normalization in improving outcomes in all cases, emphasizing the importance of data preparation in this type of analysis.
| Idioma original | Inglés |
|---|---|
| Título de la publicación alojada | Proceedings of the International Conference on Computer Science, Electronics and Industrial Engineering (CSEI 2023) - Advances in Computer Sciences - Exploring Innovations at the Intersection of Computing Technologies |
| Editores | Marcelo V. Garcia, Carlos Gordón-Gallegos, Asier Salazar-Ramírez, Carlos Nuñez |
| Editorial | Springer Science and Business Media Deutschland GmbH |
| Páginas | 257-271 |
| Número de páginas | 15 |
| ISBN (versión impresa) | 9783031692277 |
| DOI | |
| Estado | Publicada - 2024 |
| Publicado de forma externa | Sí |
| Evento | International Conference on Computer Science, Electronics and Industrial Engineering, CSEI 2023 - Ambato, Ecuador Duración: 6 nov. 2023 → 10 nov. 2023 |
Serie de la publicación
| Nombre | Lecture Notes in Networks and Systems |
|---|---|
| Volumen | 775 LNNS |
| ISSN (versión impresa) | 2367-3370 |
| ISSN (versión digital) | 2367-3389 |
Conferencia
| Conferencia | International Conference on Computer Science, Electronics and Industrial Engineering, CSEI 2023 |
|---|---|
| País/Territorio | Ecuador |
| Ciudad | Ambato |
| Período | 6/11/23 → 10/11/23 |
Nota bibliográfica
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.