A Comparative Exploration of PCA Variants for Clustering Analysis

Leo Ramos, Francklin Rivas-Echeverría, Isidro R. Amaro, Franklin Camacho

Producción científica: Capítulo del libro/informe/acta de congresoContribución de conferenciarevisión exhaustiva

Resumen

In this study, we compare the performance of Principal Component Analysis (PCA), Sparse PCA (SPCA), Robust PCA (RPCA), and Weighted PCA (WPCA) on a high-dimensional dataset of economic indicators from G20 countries. We evaluate their effectiveness in retaining variance and enhancing the performance of K-means clustering. Our comparative analysis employs metrics including effectiveness of variance retention, mean variance of distance sample-centroid, mean distance among centroids, and the rand index for cluster similarity. Our analysis indicates that PCA exhibits a greater effectiveness compared to SPCA but is outperformed by RPCA and significantly by WPCA, which shows the highest variance retention among the four methods. In terms of clustering, SPCA coupled with K-means achieves the best balance between cluster compactness and separation, as indicated by a low mean variance of distance sample-centroid and a relatively high mean distance among centroids. RPCA, while exhibiting extremely compact clusters, demonstrates the least inter-cluster separation. The rand index comparisons reveal that while PCA, SPCA, and WPCA share similar clustering structures, RPCA distinguishes itself by detecting unique patterns, contributing to a broader perspective in the analysis of the high-dimensional datasets. The study provides insightful findings that emphasize the role of appropriate dimensionality reduction method selection in enhancing the effectiveness of unsupervised learning tasks.

Idioma originalInglés
Título de la publicación alojadaProceedings - 2023 4th International Conference on Information Systems and Software Technologies, ICI2ST 2023
EditorialInstitute of Electrical and Electronics Engineers Inc.
Páginas60-67
Número de páginas8
ISBN (versión digital)9798350373219
DOI
EstadoPublicada - 2023
Evento4th International Conference on Information Systems and Software Technologies, ICI2ST 2023 - Virtual, Online, Ecuador
Duración: 22 nov. 202324 nov. 2023

Serie de la publicación

NombreProceedings - 2023 4th International Conference on Information Systems and Software Technologies, ICI2ST 2023

Conferencia

Conferencia4th International Conference on Information Systems and Software Technologies, ICI2ST 2023
País/TerritorioEcuador
CiudadVirtual, Online
Período22/11/2324/11/23

Nota bibliográfica

Publisher Copyright:
© 2023 IEEE.

Citar esto