Assessing the Efficiency and Accuracy of K-Means Clustering Compared to Other Clustering Techniques

Authors

  • Iliyas Khan Universiti Teknologi PETRONAS
  • Hanita Binti Daud Daud Fundamental and Applied Science Department, Universiti Teknologi PETRONAS, Perak 32610, Malaysia.
  • Nooraini binti Zainuddin Zainuddin Fundamental and Applied Science Department, Universiti Teknologi PETRONAS, Perak 32610, Malaysia.
  • Rajalingam Sokkalingam Sokkalingam Fundamental and Applied Science Department, Universiti Teknologi PETRONAS, Perak 32610, Malaysia.
  • Abdus Samad Azad Azad Fundamental and Applied Science Department, Universiti Teknologi PETRONAS, Perak 32610, Malaysia.
  • Abdussamad Samad Fundamental and Applied Science Department, Universiti Teknologi PETRONAS, Perak 32610, Malaysia.
  • Ahmad Abubakar Suleiman Suleiman Fundamental and Applied Science Department, Universiti Teknologi PETRONAS, Perak 32610, Malaysia.

DOI:

https://doi.org/10.63017/jdsi.v3i2.23

Keywords:

Accuracy, execution time, comparative analysis, clustering algorithms

Abstract

Clustering is an important method in data analysis, faces challenges due to the different nature of datasets, resulting in certain algorithms being less effective and taking a long time. Choosing the most effective clustering method involves evaluating its accuracy and computational speed for a dataset poses a significant challenge for today's researchers. To address these issues, current study compares different clustering methods, by using datasets, including iris, seed, and well log to evaluate their accuracy and execution speed.  Results show that K-means performs better with large datasets. As sample size increases, the accuracy of the K-means algorithm tends to improve. The execution time of k-means is influenced by the number of features in the dataset, with datasets having a larger number of features typically requiring more time to process. Mean shift algorithm and spectral clustering algorithm are performed well in small data sets, but it takes a long time.

References

Li, W., et al., An ensemble clustering framework based on hierarchical clustering ensemble selection and

clusters clustering. Cybernetics and Systems, 2023. 54(5): p. 741-766.

Li, H., et al., LSEC: Large-scale spectral ensemble clustering. Intelligent Data Analysis, 2023. 27(1): p. 59-77.

Shanmugam, G., et al., Student Psychology based optimized routing algorithm for big data clustering in IoT

with MapReduce framework. Journal of Intelligent & Fuzzy Systems, 2023(Preprint): p. 1-13.

Li, Y., et al., ZINBMM: a general mixture model for simultaneous clustering and gene selection using single

cell transcriptomic data. Genome Biology, 2023. 24(1): p. 208.

Singh, S. and K. Singh, Novel fuzzy similarity measures and their applications in pattern recognition and

clustering analysis. Granular Computing, 2023: p. 1-23.

Flores, M.A., et al., Thermographic image processing analysis in a solar concentrator with hard C-means

clustering. Energy Reports, 2023. 9: p. 312-321.

Kiran, A., et al., Enhancing Data Security in IoT Networks with Blockchain-Based Management and Adaptive

Clustering Techniques. Mathematics, 2023. 11(9): p. 2073.

Wiroonsri, N., Clustering performance analysis using a new correlation-based cluster validity index. Pattern

Recognition, 2024. 145: p. 109910.

Ahmadinejad, N., Y. Chung, and L. Liu, J-Score: a robust measure of clustering accuracy. PeerJ Computer

Science, 2023. 9: p. e1545.

Li, Q., et al., How to improve the accuracy of clustering algorithms. Information Sciences, 2023. 627: p. 52

Kodinariya, T.M. and P.R. Makwana, Review on determining number of Cluster in K-Means Clustering.

International Journal, 2013. 1(6): p. 90-95.

Gholizadeh, N., H. Saadatfar, and N. Hanafi, K-DBSCAN: An improved DBSCAN algorithm for big data. The

Journal of Supercomputing, 2021. 77: p. 6214-6235.

Monath, N., et al. Scalable hierarchical agglomerative clustering. in Proceedings of the 27th ACM SIGKDD

Conference on knowledge discovery & data mining. 2021.

Demirović, D., An implementation of the mean shift algorithm. Image Processing On Line, 2019. 9: p. 251

Song, X., et al., A spectral clustering algorithm based on attribute fluctuation and density peaks clustering

algorithm. Applied Intelligence, 2023. 53(9): p. 10520-10534.

Löster, T., Determining the optimal number of clusters in cluster analysis. Proceedings of the 10th international

days of statistics and economics, 2016: p. 8-10.

Li, M., E. Frank, and B. Pfahringer, Large scale K-means clustering using GPUs. Data Mining and Knowledge

Discovery, 2023. 37(1): p. 67-109.

Liu, J., F. Cao, and J. Liang, Centroids-guided deep multi-view k-means clustering. Information Sciences,

609: p. 876-896.

Brown, P.O., et al. Mahalanobis distance based k-means clustering. in International Conference on Big Data

Analytics and Knowledge Discovery. 2022. Springer.

De Rosa, A. and A. Khajavirad, The ratio-cut polytope and K-means clustering. SIAM Journal on

Optimization, 2022. 32(1): p. 173-203.

Pinheiro, W.A. and A.B.S. Pinheiro, Hierarchical++: improving the hierarchical clustering algorithm.

International Journal of Data Mining, Modelling and Management, 2023. 15(3): p. 223-239.

.

Yu, H. and X. Hou, Hierarchical clustering in astronomy. Astronomy and Computing, 2022: p. 100662.

Vichi, M., C. Cavicchia, and P.J. Groenen, Hierarchical means clustering. Journal of Classification, 2022.

(3): p. 553-577.

Koren, O., A. Shamalov, and N. Perel, Small Files Problem Resolution via Hierarchical Clustering Algorithm.

Big Data, 2023.

Wu, G., et al., HY-DBSCAN: A hybrid parallel DBSCAN clustering algorithm scalable on distributed-memory

computers. Journal of Parallel and Distributed Computing, 2022. 168: p. 57-69.

Hanafi, N. and H. Saadatfar, A fast DBSCAN algorithm for big data based on efficient density calculation.

Expert Systems with Applications, 2022. 203: p. 117501.

An, X., et al., STRP-DBSCAN: A Parallel DBSCAN Algorithm Based on Spatial-Temporal Random

Partitioning for Clustering Trajectory Data. Applied Sciences, 2023. 13(20): p. 11122.

Jain, P.K., M.S. Bajpai, and R. Pamula, A modified DBSCAN algorithm for anomaly detection in time-series

data with seasonality. Int. Arab J. Inf. Technol., 2022. 19(1): p. 23-28.

Cariou, C., S. Le Moan, and K. Chehdi, A novel mean-shift algorithm for data clustering. IEEE Access, 2022. 10: p. 14575-14585.

Chen, J., et al., Robust Truth Discovery Scheme Based on Mean Shift Clustering Algorithm. Journal of Internet Technology, 2021. 22(4): p. 835-842.

Belloum, F., L. Houichi, and M. Kherouf, The Performance of Spectral Clustering Algorithms on Water

Distribution Networks: Further Evidence. Engineering, Technology & Applied Science Research, 2022. 12(4):

p. 9056-9062.

Cui, Y., et al. A Spectral Clustering Algorithm Based on Differential Privacy Preservation. in International

Conference on Algorithms and Architectures for Parallel Processing. 2021. Springer.

Ikotun, A.M., et al., K-means clustering algorithms: A comprehensive review, variants analysis, and advances

in the era of big data. Information Sciences, 2023. 622: p. 178-210.

Murtagh, F. and P. Contreras, Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary

Reviews: Data Mining and Knowledge Discovery, 2012. 2(1): p. 86-97.

Dogan, A. and D. Birant, K-centroid link: a novel hierarchical clustering linkage method. Applied Intelligence,

: p. 1-24.

Ozertem, U., D. Erdogmus, and R. Jenssen, Mean shift spectral clustering. Pattern Recognition, 2008. 41(6):

p. 1924-1938.

Gou, S., X. Zhuang, and L. Jiao, Quantum immune fast spectral clustering for SAR image segmentation. IEEE

Geoscience and Remote Sensing Letters, 2011. 9(1): p. 8-12.

Downloads

Published

2025-08-01

How to Cite

Khan, I., Daud, H. B. D., Zainuddin , N. binti Z., Sokkalingam, R. S., Azad , A. S. A., Samad, A., & Suleiman, A. A. S. (2025). Assessing the Efficiency and Accuracy of K-Means Clustering Compared to Other Clustering Techniques. Data Science Insights, 3(2), 49–65. https://doi.org/10.63017/jdsi.v3i2.23