Outlier detection in the clustired data

Authors

  • Efori Bu'ulolo Politeknik Negeri Medan, Medan, Indonesia
  • Rian Syahputra Politeknik Negeri Medan, Medan, Indonesia
  • Elsya Sabrina Asmita Simorangkir Politeknik Negeri Medan, Medan, Indonesia

DOI:

https://doi.org/10.35335/cit.Vol16.2025.1005.pp394-404

Keywords:

Data, Detection, Cluster, Outlier

Abstract

The purpose of this study is to detect outliers in data clusters. Outliers in data cluster datasets often occur in the data clustering process, especially in the K-Means algorithm. Outliers in cluster data are members/cluster items that are far from the centroid value and are not found in the dominant cluster. Outliers in cluster data are caused by various factors such as inaccurate K values, inaccurate centroid point values, poor data quality and others. To detect outliers in cluster data using the blox plot method, Z-Score and relative size factor (RSF). The input value is the sum of squared error (SSE), calculated by summing the squares of the distance of each data point from the cluster centroid. The dataset used consists of 3 (three) variances, namely high data variance, medium data variance and low data variance. The method used for outlier detection in this study can detect outliers in all data variances used, only not all outlier detection methods are optimal for all data variances. The plox plot method is optimal for high data variance and medium data variance, the RSF method is optimal for medium data variance and the Z-Score method is not optimal for high data variance.

Downloads

Download data is not yet available.

References

Buulolo and Efori, Data Mining Untuk Perguruan Tinggi. Yogyakarta: deepublish, 2020. [Online]. Available: https://www.google.co.id/books/edition/Data_Mining_Untuk_Perguruan_Tinggi/-K_SDwAAQBAJ?hl=id&gbpv=1&dq=Data+Mining+Konsep+dan+Aplikasi+Menggunakan+Matlab&printsec=frontcover

M. Kantardzic, Data Mining. 2011. doi: 10.1002/9781118029145.

R. Muliono and Z. Sembiring, “Data Mining Clustering Menggunakan Algoritma K-Means Untuk Klasterisasi Tingkat Tridarma Pengajaran Dosen,” J. Comput. Eng. Syst. Sci., vol. 4, no. 2, pp. 2502–714, 2019.

E. Bu’ulolo and B. Purba, “Algoritma Clustering Untuk Membentuk Cluster Zona Penyebaran Covid-19,” Digit. Zo. J. Teknol. Inf. dan Komun., vol. 12, no. 1, pp. 59–67, 2021, doi: 10.31849/digitalzone.v12i1.6572.

P. A. Ariawan, “Optimasi Pengelompokan Data Pada Metode K-means dengan Analisis Outlier,” J. Nas. Teknol. dan Sist. Inf., vol. 5, no. 2, pp. 88–95, 2019, doi: 10.25077/teknosi.v5i2.2019.88-95.

E. Wahyuni and S. Suparman, “A Comparison of Outlier Detection Techniques in Data Mining,” in Science, Technology, Engineering, Economics, Education, and Mathematics, 2019, vol. 1, no. 1, pp. 139–147.

B. Tang and H. He, “A local density-based approach for outlier detection,” Neurocomputing, vol. 241, pp. 171–180, 2017, doi: 10.1016/j.neucom.2017.02.039.

E. T. K. Dewi, A. Agoestanto, and Sunarmi, “Metode Least Trimmed Square (Lts) Dan Mm-Estimation Untuk Mengestimasi Parameter Regresi Ketika Terdapat Outlier,” J. Math., vol. 5, no. 1, pp. 47–54, 2016, [Online]. Available: https://journal.unnes.ac.id/sju/index.php/ujm/article/view/13104

Ebrary.net, “RELATIVE SIZE FACTOR TEST,” ebrary.net.

E. Sabrina, A. Simorangkir, A. Putera, U. Siahaan, L. Marlina, and D. Nasution, “Deteksi Outlier Hasil Clustering Algoritma K-Medoids Menggunakan Metode Boxplot Pada Data KIP Kuliah,” J. Comput. Syst. Informatics, vol. 5, no. 4, pp. 893–902, 2024, doi: 10.47065/josyc.v5i4.5479.

A. S. Yaro, F. Maly, P. Prazak, and K. Maly, “Outlier Detection Performance of a Modified Z-Score Method in Time-Series RSS Observation With Hybrid Scale Estimators,” IEEE Access, vol. 12, no. January, pp. 12785–12796, 2024, doi: 10.1109/ACCESS.2024.3356731.

L. Ruwah Ibnatur Husnul, Nisak; Rima Prasetya, Eka;Sadewa, Prima;Ajimat; Ike Purnomo, STATISTIK DESKRIPTIF, no. 1. Pamulang: UNPAM PRESS, 2020. doi: 10.1007/978-3-662-48986-4_2900.

Y. Liu et al., “Generative Adversarial Active Learning for Unsupervised Outlier Detection,” IEEE Trans. Knowl. Data Eng., vol. 32, no. 8, pp. 1517–1528, 2020, doi: 10.1109/TKDE.2019.2905606.

G. Gan and M. K. P. Ng, “K-Means Clustering With Outlier Removal,” Pattern Recognit. Lett., vol. 90, pp. 8–14, 2017, doi: 10.1016/j.patrec.2017.03.008.

K. Senthamarai Kannan and K. Manoj, “Outlier detection in multivariate data,” Appl. Math. Sci., vol. 9, no. 45–48, pp. 2317–2324, 2015, doi: 10.12988/ams.2015.53213.

N. T. Hartanti, “Jurnal Nasional Teknologi dan Sistem Informasi Metode Elbow dan K-Means Guna Mengukur Kesiapan Siswa SMK Dalam Ujian Nasional,” vol. 02, pp. 82–89, 2020.

A. Winarta and W. J. Kurniawan, “Optimasi Cluster K-Means Menggunakan Metode Elbow Pada Data Pengguna Narkoba Dengan Pemrograman Python,” JTIK (Jurnal Tek. Inform. Kaputama), vol. 5, no. 1, pp. 113–119, 2021, doi: 10.59697/jtik.v5i1.593.

S. Bu’ulolo;Efori, Mesran, Hasibuan;Nelly Astuti, Utomo;Aripin;Soeb, Putro Utomo, Big Data Analysis dengan Phyton untuk Perguruan Tinggi, I. Yogyakarta, 2023.

M. Bachmaier, “The striking criterion whether variance calculation requires dividing the sum of squares by the number of summands or by that number less one by,” Int. J. Educ. Res., vol. 1, no. 6, pp. 1–14, 2013.

K. P. Sinaga and M. S. Yang, “Unsupervised K-means clustering algorithm,” IEEE Access, vol. 8, pp. 80716–80727, 2020, doi: 10.1109/ACCESS.2020.2988796.

R. Al Muiz, “Comparison of K-Means and Fuzzy C-Means for Optimizing Tuberculosis Management and Healthcare Service Allocation in Bojonegoro,” J. Stat. Dan Komputasi, vol. 3, no. 2, pp. 80–91, 2024.

N. Rokhman, H. Nugroho, and D. Saputri, “Peningkatan Efisiensi Penugasan Guru di Provinsi Daerah Istimewa Yogyakarta Melalui Penghapusan Outlier,” in Konferensi Nasional Sistem Informasi 2018, 2018, pp. 8–9.

V. Agarwal, “Outlier detection with Boxplots,” medium.com, 2019. https://medium.com/@agarwal.vishal819/outlier-detection-with-boxplots-1b6757fafa21

M. KENAMON, Y. D. WINAWUNG, and H. HANINUN, “Prediksi Kebangkrutan Dengan Model Altman Z-Score Pada Perusahaan Farmasi Yang Terdaftar Di Bursa Efek Indonesia Periode 2012-2016,” J. Akunt. dan Keuang., vol. 9, no. 1, p. 10, 2018, doi: 10.36448/jak.v9i1.999.

A. Nugroho, “Data Interrogation & Analysis: Teknik Mendeteksi Outlier - Metode Relative Size Factor,” Badan Pendidikan dan Pelatihan Keuangan, 2020. https://klc2.kemenkeu.go.id/kms/knowledge/klc1-pknstan-data-interrogation-analysis-teknik-mendeteksi-outlier-metode-relative-size-factor/detail/

Downloads

Published

2025-01-31

How to Cite

Bu’ulolo, E., Syahputra, R., & Simorangkir, E. S. A. (2025). Outlier detection in the clustired data. Jurnal Teknik Informatika C.I.T Medicom, 16(6), 342–352. https://doi.org/10.35335/cit.Vol16.2025.1005.pp394-404