The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Self-Organizing Map vs Initial Centroid Selection

A compilation of artificial intelligence techniques are employed in this research to enhance the process of clustering transcribed text documents obtained from audio sources. Many clustering techniques suffer from drawbacks that may cause the algorithm to tend to sub optimal solutions, handling these drawbacks is essential to get better clustering results and avoid sub optimal solutions. The main target of our research is to enhance automatic topic clustering of transcribed speech documents, and examine the difference between implementing the K-means algorithm using our Initial Centroid Selection Optimization (ICSO) [16] with genetic algorithm optimization with Chi-square similarity measure to cluster a data set then use a self-organizing map to enhance the clustering process of the same data set, both techniques will be compared in terms of accuracy. The evaluation showed that using K-means with ICSO and genetic algorithm achieved the highest average accuracy.


[1] Abhishekkumar K. and Sadhana C., “Survey Report on K-Means Clustering Algorithm,” International Journal of Modern Trends in Engineering and Research, vol. 4, pp. 218-221, 2017.

[2] Affenzeller M., Wagner S., and Winkler S., “Aspects of Adaptation in Natural and Artificial Evolution,” in Proceedings of the 9th Annual Conference Companion on Genetic and Evolutionary, London, pp. 2595-2602, 2007.

[3] Agarwal S., “Data Mining: Data Mining Concepts and Techniques,” in Proceedings of International Conference on Machine Intelligence and Research Advancement, Katra, pp. 203-207, 2013.

[4] Banerjee A. and Louis S., “A Recursive Clustering Methodology Using A Genetic Algorithm,” in Proceedings of IEEE Congress on Evolutionary Computation1, Singapore, pp. 66-71, 2007.

[5] Coden A. and Brown E., “Speech Transcript Analysis for Automatic Search,” in Proceedings of the Hawaii International Conference on System Sciences, Maui, pp. 9, 2001.

[6] Evritt B., Landau S., and Leese M., Cluster Analysis, Wiley Series in Probability and Statistics, 2011.

[7] Goldberg D., Genetic Algorithms in Search, Optimization, and Machine Learning, Addison- Wesley Publishing, 1989.

[8] Hamerly G. and Drake J., Partitional Clustering Algorithms, Springer, 2014.

[9] Herrmann M., “Self-Organizing Feature Maps with Self-Organizing Neighborhood Widths,” in Proceedings of ICNN95-International Conference on Neural Networks, Perth, pp. 2998-3003, 1997.

[10] Jafar A., Fakhr M., and Farouk M., “Enhanced Clustering-Based Topic Identification of Transcribed Arabic Broadcast News,” The International Arab Journal of Information Technology, vol. 14, no. 5, pp. 721-728, 2017.

[11] Jian-Xiang W., Huai L., Yue-Hong S., and Xin- Ning S., “Application of Genetic Algorithm in Document Clustering,” in Proceedings of International Conference on Information Technology and Computer Science, Kiev, pp. 145-148, 2009.

[12] Joshi K. and Nalwade P., “Modified K-Means for Better Initial Cluster Centers,” International Journal of Computer Science and Mobile Computing, vol. 2, no. 7, pp. 219-223, 2013. 324 The International Arab Journal of Information Technology, Vol. 17, No. 3, May 2020

[13] Li D., Sethi I., Dimitrova N., and Mcgee T., “Classification of General Audio Data for Content-Based Retrieval,” Pattern Recognition Letters, vol. 22, no. 5, pp. 533-544, 2001.

[14] Liu Y., Liu M., and Wang X., Applications of Self-Organizing Maps, Magnus Johnsson, 2012.

[15] Lu S., “Pattern Classification Using Self- Organizing Feature Maps,” in Proceedings of International Joint Conference on Neural Networks, San Diego, pp. 471-480, 1990.

[16] Maghawry A., Omar Y., and Badr A., “Initial Centroid Selection Optimization for K-means with Genetic Algorithm to Enhance Clustering of Transcribed Arabic Broadcast News Documents,” Computational Methods for Systems and Software CoMeSySo: Applied Computational Intelligence and Mathematical Methods, Szczecin, pp. 86-101, 2017.

[17] Mai X., Cheng J., and Wang S., “Research on Semi Supervised K-Means Clustering Algorithm in Data Mining,” Cluster Computer, vol. 22, pp. 3513-3520, 2019.

[18] Morissette L. and Chartier S., “The K-Means Clustering Technique: General Considerations and Implementation in Mathematica,” Tutorials in Quantitative Methods for Psychology, vol. 9, no. 1, pp. 1524, 2013.

[19] Shrivastava P., Kavita P., Singh S., Shukla M., “Comparative Analysis in Between The K-Means Algorithm, K-Means Using with Gaussian Mixture Model and Fuzzy C Means Algorithm,” in Proceedings of the International Conference on Communication and Computing Systems, Taylor and Francis Group, London, pp. 1037-1042, 2016.

[20] Speechnotes.

[Online]. Available: https://speechnotes.co/, Last Visited, 2017.

[21] Sun H. and Xiong L., “Genetic Algorithm-Based High-dimensional Data Clustering Technique,” in Proceedings of International Conference on Fuzzy Systems and Knowledge Discovery, Tianjin, pp. 485-489, 2009.

[22] Tiwari A., Sharma L., and Krishna G., “Entropy Weighting Genetic K-Means Algorithm for Subspace Clustering,” International Journal of Computer Applications, vol. 7, no. 7, pp. 27-30 2010.

[23] Wold E., Blum T., Keislar D., and Wheaten J., “Content-based Classification, Search, and Retrieval of Audio,” IEEE Multimedia, vol. 3, no. 3, p. 27-36, 1996.

[24] Wong C., “A Short Survey on Data Clustering Algorithms,” in Proceedings of 2nd International Conference on Soft Computing and Machine Intelligence, Hong Kong, pp. 64-68, 2015.

[25] Wu J., Advances in K-Means Clustering, Springer, 2012.

[26] Xiao-Feng L., Kun-Qing X., Fan L., and Zheng- Yi X., “An Efficient Clustering Algorithm Based on Local Optimality of K-Means,” Journal of Software, vol. 19, no. 7, 2008.

[27] Xu R. and Wunschii D., “Survey of Clustering Algorithms,” IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 645-678, 2005.

[28] Yadav A. and Singh S., “An Improved K-Means Clustering Algorithm,” International Journal of Computing Academic Research, pp. 88-103, vol. 5, no. 2, 2016. Ahmed Maghawry software developer at a leading company in Egypt in the field of electronic payments and solutions. His research interests are in artificial intelligence machine learning, and computing algorithms, received MSc in computer science from the Arab Academy for Science and Technology and Maritime Transportation. Yasser Omar assistant professor in the Department of Computer Science, Faculty of Computing and Information Technology, Arab Academy for Science Technology & Maritime Transport. His research interests are bioinformatics, medical imaging, data visualization, machine learning, and computing algorithms. Omar received a PhD in biomedical engineering from Cairo University. Amr Badr is a Professor in the Department of Computer Science, Faculty of Computers and Information, Cairo University. He received his BSc in Engineering with Honors in 1986. He received his MSc and PhD in Computer Science from Cairo University in 1995 and 1998. His research interests are Intelligent Systems, Bioinformatics, Medical Imaging and P-systems. He has published more than 170 journal research papers in these areas.