
Enhanced Latent Semantic Indexing Using Cosine Similarity Measures for Medical Application
The Vector Space Model (VSM) is widely used in data mining and Information Retrieval (IR) systems as a common
document representation model. However, there are some challenges to this technique such as high dimensional space and
semantic looseness of the representation. Consequently, the Latent Semantic Indexing (LSI) was suggested to reduce the
feature dimensions and to generate semantic rich features that can represent conceptual term-document associations. In fact,
LSI has been effectively employed in search engines and many other Natural Language Processing (NLP) applications.
Researchers thereby promote endless effort seeking for better performance. In this paper, we propose an innovative method
that can be used in search engines to find better matched contents of the retrieving documents. The proposed method
introduces a new extension for the LSI technique based on the cosine similarity measures. The performance evaluation was
carried out using an Arabic language data collection that contains 800 medical related documents, with more than 47,222
unique words. The proposed method was assessed using a small testing set that contains five medical keywords. The results
show that the performance of the proposed method is superior when compared to the standard LSI.
[28] Yeh J., Ke H., Yang W., and Meng I., “Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis,” Information Processing and Management, vol. 41, no. 1, pp. 75-95, 2005. Fawaz Al-Anzi Professor Al-Anzi received his Ph.D. & M.Sc. in Computer Science from Rensselaer Polytechnic Institute, New York, USA in 1995. He earned his B.Sc. with honors in EE from Kuwait University in 1987. He received the National Research Production Award and Kuwait University Award. He is the founding dean of College of Computing Sciences and Engineering. His research interest includes data science and engineering, text classification and speech recognition. Dia AbuZeina received his Ph.D. in Computer Science and Engineering from King Fahd University of Petroleum and Minerals, Saudi Arabia, 2011. He received his M.Sc. in information technology from Southern New Hampshire University, Manchester, USA, 2005. He received his B.Sc. in computer system engineering from Palestine Polytechnic University, 2001. His research interest includes speech recognition and text classification for modern standard Arabic (MSA)