The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Discovery of Arbitrary-Shapes Clusters Using DENCLUE Algorithm

One of the main requirements in clustering spatial datasets is the discovery of clusters with arbitrary-shapes. Density-based algorithms satisfy this requirement by forming clusters as dense regions in the space that are separated by sparser regions. DENCLUE is a density-based algorithm that generates a compact mathematical form of arbitrary-shapes clusters. Although DENCLUE has proved its efficiency, it cannot handle large datasets since it requires large computation complexity. Several attempts were proposed to improve the performance of DENCLUE algorithm, including DENCLUE 2. In this study, an empirical evaluation is conducted to highlight the differences between the first DENCLUE variant which uses the Hill-Climbing search method and DENCLUE 2 variant, which uses the fast Hill-Climbing method. The study aims to provide a base for further enhancements on both algorithms. The evaluation results indicate that DENCLUE 2 is faster than DENCLUE 1. However, the first DECNLUE variant outperforms the second variant in discovering arbitrary-shapes clusters.


[1] Aggarwal C. and Reddy C., Data Clustering: Algorithms and Applications, CRC Press, 2013.

[2] Carreira-Perpinan M., “Gaussian Mean-Shift Is an EM Algorithm,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 5, pp. 767-776, 2007.

[3] Chang H. and Yeung D., “Robust Path-Based Spectral Clustering,” Pattern Recognition, vol. 41, no. 1, pp. 191-203, 2008.

[4] Doan H. and Nguyen D., “A Method for Finding the Appropriate Number of Clusters,” The International Arab Journal of Information Technology, vol. 15, no. 4, pp. 675-682, 2018.

[5] Fukunaga K. and Hostetler L., “The Estimation of The Gradient of A Density Function, with Applications in Pattern Recognition,” IEEE 634 The International Arab Journal of Information Technology, Vol. 17, No. 4A, Special Issue 2020 Transactions on Information Theory, vol. 21, no. 1, pp. 32-40, 1975.

[6] Hinneburg A. and Gabriel H., “DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation,” in Proceedings of International Symposium on Intelligent Data Analysis Berlin, Ljubljana, pp. 70-80, 2007.

[7] Hinneburg A. and Keim D., “An Efficient Approach to Clustering in Large Multimedia Databases with Noise,” in Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pp. 58-65, 1998.

[8] Hubert L. and Arabie P., “Comparing Partitions,” Journal of Classification, vol. 2, no. 1, pp. 193- 218, 1985.

[9] Kalti K. and Mahjoub M., “Image Segmentation by Gaussian Mixture Models and Modified FCM Algorithm,” The International Arab Journal of Information Technology, vol. 11, no. 1, pp. 11-18, 2014.

[10] Khader M. and Al-Naymat G., “An Overview of Various Enhancements of DENCLUE Algorithm,” in Proceedings of the Second International Conference on Data Science, E- Learning and Information, Dubai, pp. 1-7, 2019.

[11] Li X., Hu Z., and Wu F., “A Note on the Convergence of the Mean Shift,” Pattern Recognition, vol. 40, no. 6, pp. 1756-1762, 2007.

[12] Luo Y., Zhang K., Chai Y., and Xiong Y., “Multi- Parameter-Setting Based on Data Original Distribution for DENCLUE Optimization,” IEEE Access, vol. 6, pp. 16704-16711, 2018.

[13] Milligan G. and Cooper M., “A Study of The Comparability of External Criteria for Hierarchical Cluster Analysis,” Multivariate Behavioral Research, vol. 21, no. 4, pp. 441-458, 1986.

[14] Muller E., Assent I., Gunnemann S., and Seidl T., “Scalable Density-based Subspace Clustering,” in Proceedings of the 20th ACM International Conference on Information and Knowledge Management, New York, pp. 1077-1086, 2011.

[15] Qiu R., Wang K., Li S., Dong J., and Xie M., “Big Data Technologies in Support of Real Time Capturing And Understanding of Electric Vehicle Customers Dynamics,” in Proceedings of the 5th International Conference on Software Engineering and Service Science, Beijing, 2014.

[16] RenY., KamathU., Domeniconi C., and Zhang G., “Boosted Mean Shift Clustering,” in Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases-Volume Part II, Nancy, pp. pp 646-661, 2014.

[17] Schneider J. and Vlachos M., “Fast Parameterless Density-based Clustering via Random Projections,” in Proceedings of the 22Nd ACM International Conference on Information and Knowledge Management, New York, pp. 861- 866, 2013.

[18] Yeung K. and Ruzzo W., “Details of the Adjusted Rand Index and Clustering Algorithms, Supplement to The Paper An Empirical Study on Principal Component Analysis for Clustering Gene Expression Data, ” Bioinformatics, vol. 17, no. 9, pp. 763-774, 2001.

[19] Zaki M., Meira W., and Meira W., Data Mining and Analysis: Fundamental Concepts and Algorithms, Cambridge University Press, 2014. Mariam Khader She is currently working as a lecturer in Princess Sumaya University for Technology (PSUT), Amman, Jordan. She received the BSc degree in computer networking systems from the World Islamic Science & Education University (WISE) in 2012, Amman, Jordan. She received her MSc Degree in IT security and digital criminology in 2014 from PSUT. Currently, she is a PhD Candidate in computer science at PSUT. Between 2012-2015, she worked a teacher assistant and then a lecturer at the network department in the World Islamic Science and Education University. Her interests include digital forensics, network security and big data analytic. Ghazi Al-Naymat He received his Ph.D. degree in May 2009 from the School of Information Technologies at The University of Sydney, Australia. He is currently working as an Associate Professor at the College of Engineering and Information Technology at Ajman University, UAE. In 2015, he joined the Department of Computer Science, King Hussein School of Computing Sciences at Princess Sumaya University for Technology (PSUT). He is a member of The Australian Computer Society. His research interests include Data Mining and machine learning, big data, and data science. Al- Naymat always targets reputable venues for his publications.