The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Extracting Word Synonyms from Text using Neural

Approaches,
Extracting synonyms from textual corpora using computational techniques is an interesting research problem in the Natural Language Processing (NLP) domain. Neural techniques (such as Word2Vec) have been recently utilized to produce distributional word representations (also known as word embeddings) that capture semantic similarity/relatedness between words based on linear context. Nevertheless, using these techniques for synonyms extraction poses many challenges due to the fact that similarity between vector word representations does not indicate only synonymy between words, but also other sense relations as well as word association or relatedness. In this paper, we tackle this problem using a novel 2-step approach. We first build distributional word embeddings using Word2Vec then use the induced word embeddings as an input to train a feed- forward neutral network using annotated dataset to distinguish between synonyms and other semantically related words.


[1] Bahashwan M., Abu-Bakar S., and Sheikh U., “Efficient Segmentation of Arabic Handwritten Characters Using Structural Features,” The International Arab Journal of Information Technology, vol. 14, no. 6, pp. 870-879, 2017.

[2] Crystal D. The Cambridge Encyclopedia of Language, Cambridge: Cambridge University Press, 1987.

[3] Harris Z., “Distributional Structure,” Word, vol. 10, no. 2-3, pp. 146-162, 1954.

[4] Hill F., Reichart R., and Korhonen A., “Simlex- 999: Evaluating Semantic Models with (genuine) Similarity Estimation,” Computational Linguistics, vol. 41, no. 4, pp. 665-695, 2016.

[5] Jurafsky D. and Martin J., Speech and Language Processing, Pearson, 2014.

[6] Khan K., Baharudin B., and Khan A., “Identifying Product Features from Customer Reviews Using Hybrid Patterns,” The International Arab Journal of Information Technology, vol. 11, no. 3, pp. 281- 286, 2014.

[7] Leeuwenberg A., Vela M., Dehdari J., and van Genabith J., “A Minimally Supervised Approach For Synonym Extraction with Word Embeddings,” The Prague Bulletin of Mathematical Linguistics, vol. 105, no. 1, pp. 111-142, 2016.

[8] Levy O. and Goldberg Y., “Dependency-Based Word Embeddings,” in Proceedings of the 52nd Annual Meeting of the Association for Extracting Word Synonyms from Text using Neural Approaches 51 Computational Linguistics, Baltimore, pp. 302- 308, 2014.

[9] Mikolov T., Chen T., Corrado G., and Dean J., “Efficient Estimation of Word Representations in Vector Space,” in Proceedings of the 1st International Conference on Learning Representations, Scottsdale, pp. 1-12, 2013.

[10] Miller G., “WordNet: A Lexical Database for English,” Communications of the ACM, vol. 38, no. 11, pp. 39-41, 1995.

[11] Pennington J., Socher R., and Manning C., “Glove: Global Vectors for Word Representation,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, pp. 1532-1543, 2014.

[12] Toutanova K., Klein D., Manning C., and Singer Y., “Feature-Rich Part-of-Speech Tagging with A Cyclic Dependency Network,” in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, pp. 173-180, 2003.

[13] Turney P., “Domain and function: A Dual-Space Model of Semantic Relations and Compositions,” Journal of Artificial Intelligence Research, vol. 44, pp. 533-585, 2012.

[14] Van der Plas L. and Tiedemann J., “Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity,” in Proceedings of The COLING/ACL on Main Conference Poster Sessions, Sydney, pp. 866- 873, 2006. Nora Mohammed is currently a researcher at the college of engineering, Al-Qadisiyah University, Iraq. Her main research interest is in the field of natural language processing and computational linguistics. She has worked on mining natural language text for information retrieval, relation extraction, and synonymy discovery. She received her Master degree in Computer Science and Engineering from Osmania University, India.