Determining Initial Centroid in K-Means using Global Average and Data Dimension Variance
Keywords:
K-Means, Centroid Global MeanAbstract
The selection of the right initial centroid greatly affects the quality of clustering results in the K-Means algorithm. This study proposes a new approach in determining the initial centroid by utilizing the global average and variance of data dimensions. The global average is used to represent the overall center position of the data, while the variance of dimensions provides information on the distribution of each feature. This method is tested using three-dimensional synthetic data (X, Y, Z) with 121 data, and compared with the random initialization approach. The results show that the global average and variance-based method produces more balanced clusters, lower Sum of Squared Error (SSE) values, and the highest Silhouette Score value (0.65), as well as faster convergence. Compared to two random initialization scenarios, this method is proven to be more stable in separating clusters based on the distribution of low, medium, and high values. This approach makes an important contribution to the development of a more consistent and effective K-Means initialization strategy, especially for low to medium-dimensional numerical datasets.
Downloads
References
S. Suyal, Manish;Sharma, “A Review on Analysis of K-Means Clustering Machine Learning Algorithm based on Unsupervised Learning,” J. Artif. Intell. Syst., vol. 6, pp. 85–95, 2024, doi: 10.1155/2022/6866747.
Buulolo;Efori, Data Mining Untuk Perguruan Tinggi. Yogyakarta: deepublish, 2020.
E. Bu’ulolo, Mesran, N. A. Hasibuan, S. Aripin, D. P. Utomo, and R. Syahputra, Big Data Analysis dengan Phyton untuk Perguruan Tinggi, I. Yogyakarta, 2023.
R. Istighfariyansyah, M. Hakimah, M. Kurniawan, J. Teknik Informatika, T. Adhi, and T. Surabaya, “Klasterisasi Produk Berdasarkan Data Penjualan Menggunakan Algoritma K-Means Dengan Penentuan Centroid Awal,” in Seminar Nasional Sains dan Teknologi Terapan XI 2023, 2023, pp. 1–7. [Online]. Available: https://ejurnal.itats.ac.id/sntekpan/article/view/5198
R. G. Prasasti Alam and Y. Everhard, “Optimasi K-Means Dengan Particle Swarm Optimization (PSO) Dalam Penentuan Titik Awal Pusat Klaster Data Telekomunikasi,” Techno.Com, vol. 23, no. 1, pp. 96–111, 2024, doi: 10.62411/tc.v23i1.9743.
M. Raeisi and A. B. Sesay, “A Distance Metric for Uneven Clusters of Unsupervised K-Means Clustering Algorithm,” IEEE Access, vol. 10, no. August, pp. 86286–86297, 2022, doi: 10.1109/ACCESS.2022.3198992.
K. Preeti;Deep, “Automatic centroid initialization in k-means using artificial hummingbird algorithm,” Neural Comput. Appl., vol. 37, no. 5, p. https://dl.acm.org/doi/10.1007/s00521-024-10764-4, 2024.
S. Mair and J. Sjölund, “Archetypal Analysis++: Rethinking the Initialization Strategy,” Trans. Mach. Learn. Res., 2023, [Online]. Available: http://arxiv.org/abs/2301.13748
M. Arief Soeleman and F. Ilmu Komputer, “Penentuan CentroidAwal Pada Algoritma K-Means Dengan Dynamic Artificial Chromosomes Genetic AlgorithmUntuk Tuberculosis Dataset,” Februari, vol. 20, no. 1, pp. 97–108, 2021.
A. A. Khan, M. S. Bashir, A. Batool, M. S. Raza, and M. A. Bashir, “K-Means Centroids Initialization Based on Differentiation Between Instances Attributes,” Int. J. Intell. Syst., vol. 2024, no. 1, 2024, doi: 10.1155/2024/7086878.
A. Primandana, S. Adinugroho, and C. Dewi, “Optimasi Penentuan Centroid pada Algoritme K-Means Menggunakan Algoritme Pillar (Studi Kasus: Penyandang Masalah Kesejahteraan Sosial di Provinsi …,” … Teknol. Inf. dan Ilmu …, vol. 3, no. 11, pp. 10678–10683, 2020, [Online]. Available: http://j-ptiik.ub.ac.id/index.php/j-ptiik/article/download/6748/3264
D. Lestari, A. Charis Fauzan, F. Ilmu Eksakta, P. Studi Ilmu Komputer, U. Nahdlatul Ulama Blitar, and J. Masjid No, “Penerapan Algoritma Pillar Untuk Optimasi Penentuan Titik Awal Centroid Pada Algoritma K-Means Clustering,” JOISIE J. Inf. Syst. Informatics Eng., vol. 6, no. 1, pp. 15–24, 2022.
D. A. H. Aliwy and D. K. B. S. Aljanabi, “An Efficient Algorithm for Initializing Centroids in K-means Clustering,” J. Kufa Math. Comput., vol. 3, no. 2, pp. 18–24, 2016, doi: 10.31642/jokmc/2018/030203.
V. V. Romanuke, “Random Centroid Initialization for Improving Centroid-Based Clustering,” Decis. Mak. Appl. Manag. Eng., vol. 6, no. 2, pp. 734–746, 2023, doi: 10.31181/dmame622023742.
K. Clustering, R. Scaling, and G. Scholar, “Enhancing K-Means Clustering Accuracy Through Modified Robust Scaling Technique Enhancing K-Means Clustering Accuracy Through Modified Robust Scaling Technique,” Preprints.or, pp. 0–13, 2024, doi: 10.20944/preprints202411.1245.v1.
J. Solomon, K. Greenewald, and H. Nagaraja, “$k$-Variance: A Clustered Notion of Variance,” SIAM J. Math. Data Sci., vol. 4, no. 3, pp. 957–978, 2022, doi: 10.1137/20m1385895.
A. Vouros, S. Langdell, M. Croucher, and E. Vasilaki, “An empirical comparison between stochastic and deterministic centroid initialisation for K-means variations,” Mach. Learn., vol. 110, no. 8, pp. 1975–2003, 2021, doi: 10.1007/s10994-021-06021-7.
S. Pourahmad, A. Basirat, A. Rahimi, and M. Doostfatemeh, “Does Determination of Initial Cluster Centroids Improve the Performance of K -Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study,” Comput. Math. Methods Med., vol. 2020, 2020, doi: 10.1155/2020/7636857.
Q. Bi, H. Sun, C. Qian, and K. Zhang, “An improved seeds scheme in K-means clustering algorithm for the UAVs control system application,” IET Commun., vol. 18, no. 7, pp. 437–449, 2024, doi: 10.1049/cmu2.12746.
D. Arthur and S. Vassilvitskii, “K-means++: The advantages of careful seeding,” Proc. Annu. ACM-SIAM Symp. Discret. Algorithms, vol. 07-09-Janu, pp. 1027–1035, 2007.
C. Fan, M. Chen, X. Wang, J. Wang, and B. Huang, “A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery From Building Operational Data,” Front. Energy Res., vol. 9, no. March, pp. 1–17, 2021, doi: 10.3389/fenrg.2021.652801.
G. C. Montgomery, Douglas C;Runger, Applied Statistics and Probability for Engineers, 6th ed. Hoboken, New Jersey, USA: Wiley (John Wiley & Sons, Inc.), 2014.
F. Afra, “Rumus Varians: Pengertian, Jenis, Cara Menghitung, dan Contohnya,” detikEdu, 2023. https://www.detik.com/edu/detikpedia/d-6952619/rumus-varians-pengertian-jenis-cara-menghitung-dan-contohnya
L. P. Refialy, H. Maitimu, and M. S. Pesulima, “Perbaikan Kinerja Clustering K-Means pada Data Ekonomi Nelayan dengan Perhitungan Sum of Square Error (SSE) dan Optimasi nilai K cluster,” Techno.Com, vol. 20, no. 2, pp. 321–329, 2021, doi: 10.33633/tc.v20i2.4572.
N. Nugroho and F. D. Adhinata, “Penggunaan Metode K-Means dan K-Means++ Sebagai Clustering Data Covid-19 di Pulau Jawa,” Teknika, vol. 11, no. 3, pp. 170–179, 2022, doi: 10.34148/teknika.v11i3.502.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Efori Bu'ulolo

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

