The International Arab Journal of Information Technology (IAJIT)


Clustering with Probabilistic Topic Models on

 Recently, probabilistic topic models such as Latent Dirichlet Allocation (LDA) have been widely used f or applications in many text mining tasks such as retr ieval, summarization and clustering on different languages. In this paper, we present a first comparative study between LDA an d K/means, two well/known methods respectively in topics identification and clustering applied on Arabic texts. Our aim is to compare the influence of morpho/syntactic charac teristics of Arabic language on performance of first method compared to the second one. In order to, study different aspects of those methods the study is conducted on four benchmark document colle ctions in which the quality of clustering was measured by the use of four well/known evaluation measures, Rand index, Jaccard index, F/measure and Entropy. The results consistently show that LDA perform best results more than K/means in most case s.

[27] Zhao Y. and Karypis G., Criterion Functions for Document Clustering: Experiments and Analysis, available at: i=, last visited 2001. 338 The International Arab Journal of Information Techn ology VOL. 13, NO. 2, March 2016 Abdessalem Kelaiaia received his Engineer degree from Annaba University, Algeria in 1996, and his MS degree in Computer Science from the Guelma University, Algeria in 2008. Currently, he is working as an Assistant Professor at the University of May 08, Algeria and he is preparing t he PhD degree at Annaba University. His current resear ch field is text mining. Hayet Merouani received her Engineer degree from Annaba University, Algeria in 1984, PhD degree from Robert Gordon University, UK. Actually, she is full Associate Professor at Badji Mokhtar University, Annaba. She also, leads Research group of Pattern recognition a s a national program research of breast cancer. Her cur rent works focus on the computer vision, medical imaging and Biometry.