The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


A Sentiment Analysis System for the Hindi Language by Integrating Gated Recurrent Unit

The growing availability and popularity of opinion rich resources such as blogs, shopping websites, review portals, and social media platforms have attracted several researchers to perform the sentiment analysis task. Unlike English, Chinese, Spanish, etc. the availability of Indian languages such as Hindi, Telugu, Tamil, etc., over the web have also been increased at a rapid rate. This research work understands the growing popularity of Hindi language in the web domain and considered it for the task of sentiment analysis. The research work analyses the hidden sentiments from the movie reviews collected from the review section of Hindi language e-newspapers. The reviews are multilingual, which makes sentiment analysis a challenging task. To overcome the challenges, this research work proposes a deep learning based approach where a Gated Recurrent Unit network is combined with the Hindi word embedding model. The strategy enables the network to efficiently capture the semantic and syntactic relation between Hindi words and accurately classify them into the sentiment classes. Gated Recurrent Unit network's performance is profoundly dependent upon the selection of its hyper-parameters; therefore, this research work also utilizes a Genetic Algorithm to automatically build a gated recurrent network architecture enabling it to select the best optimal hyper-parameters. It has been observed that the proposed Genetic Algorithm-Gated Recurrent Unit (GA-GRU) model is effective and achieves breakthrough performance results on the Hindi movie review dataset as compared to other traditional resource-based and machine learning approaches.


[1] Akhtar M., Kumar A., Ekbal A., and Bhattacharyya P., “A Hybrid Deep Learning Architecture for Sentiment Analysis,” in Proceedings of COLING, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, pp. 482-493, 2016.

[2] Akhtar M., Ekbal A., and Bhattacharyya P., “Aspect based Sentiment Analysis in Hindi: Resource Creation and Evaluation,” in Proceedings of the 10th International Conference on Language Resources and Evaluation, Portorož, pp. 2703-2709, 2016.

[3] Arora P., Sentiment Analysis for Hindi Language, MS Thesis, Research in Computer Science, 2013.

[4] Bakliwal A., Arora P., Patil A., and VarmaV., “Towards Enhanced Opinion Classification using NLP Techniques,” in Proceedings of the Workshop on Sentiment Analysis where AI Meets Psychology, Chiang Mai, pp. 101-107, 2011.

[5] Cho K., Van Merriënboer B., Gulcehre, C., Bahdanau D., Bougares F., Schwenk H., and Bengio Y., “Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, pp. 1724-1734, 2014.

[6] Cohen J., “A Coefficient of Agreement for Nominal Scales.,” Educational and psychological Measurement, no. 1, pp. 37-46, 1960.

[7] Contributors W., “List of Languages by Number of Native Speakers,” Available: https://en.wikipedia.org/wiki/List_of_languages_ A Sentiment Analysis System for the Hindi Language by Integrating Gated ... 963 by-number-of-native-speakers, Last Visited, 2019.

[8] Das A. and Bandyopadhyay S., “SentiWordNet for Indian Languages,” in Proceedings of the 8th Workshop on Asian Language Resources, August, Beijing, pp. 56-63, 2010.

[9] Dwivedi S. and Sukhadeve P., “Translation Rules for English to Hindi Machine Translation System: Homoeopathy Domain,” The International Arab Journal of Information Technology, vol. 12, no. 6A, pp. 791-796, 2015.

[10] Grave E., Bojanowski P., Gupta P., Joulin A., and Mikolov T., “Learning Word Vectors for 157 Languages,” in Proceedings of the 11th International Conference on Language Resources and Evaluation, Miyazaki, 2018.

[11] Hochreiter S. and Schmidhuber J., “Long Short Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.

[12] Joshi A., Bhattacharyya P., and Balamurali R., “A Fall-Back Strategy for Sentiment Analysis in Hindi: A Case Study,” in Proceedings of ICON 8th International Conference on Natural Language Processing, Macmillan Publishers, 2010.

[13] Joulin A., Grave E., Bojanowski P., and Mikolov T., “Bag of Tricks for Efficient Text Classification,” in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, pp. 427-431, 2016.

[14] Kaur A. and Gupta V., “A Novel Approach for Sentiment Analysis of Punjabi Text Using SVM,” The International Arab Journal of Information Technology, vol. 14, no. 5, pp. 707- 712, 2017.

[15] Kim Y., “Convolutional Neural Networks for Sentence Classification,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, pp. 1746-1751, 2014.

[16] Landis J. and Koch G., “The Measurement of Observer Agreement for Categorical Data,” Biometrics, vol. 33, no. 1, pp. 159-174, 1977.

[17] Lo S., Cambria E., Chiong R., and Cornforth D., “Multilingual Sentiment Analysis: from Formal to Informal and Scarce Resource Languages,” Artificial Intelligence Review, vol. 48, no. 4, pp. 499-527, 2017.

[18] Mikolov T., Corrado G., Chen K., and Dean J., “Efficient Estimation of Word Representations in Vector Space Vector Space,” in Proceedings of the International Conference on Learning Representations, pp. 1-12, 2013.

[19] Mittal T. and Sharma R., “Multiclass SVM Based Spoken Hindi Numerals Recognition,” The International Arab Journal of Information Technology, vol. 12, no. 6A, pp. 666-671, 2015.

[20] Omara E., Mosa M., and Ismail N., “Deep Convolutional Network for Arabic Sentiment Analysis for Arabic sentiment Analysis,” in International Japan-Africa Conference on Electronics, Communications and Computations, Alexandria, pp. 155-159, 2018.

[21] Pandey P. and Govilkar S., “A Framework for Sentiment Analysis in Hindi using H-SWN,” International Journal of Computer Applications, vol. 119, no. 19, pp. 23-26, 2015.

[22] Pennington J., Socher R., and Manning C., “GloVe : Global Vectors for Word Representation,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, pp. 1532-1543, 2014.

[23] Ramanathan A. and Rao D., “A Lightweight Stemmer for Hindi,” in Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, pp. 42-48, 2003.

[24] Rani S. and Kumar P., “Deep Learning Based Sentiment Analysis Using Convolution Neural Network,” Arabian Journal for Science and Engineering, vol. 44, no. 4, pp. 3305-3314, 2019.

[25] Sarkar K. and Chakraborty S., “A Sentiment Analysis System for Indian Language Tweets,” in Proceedings of International Conference on Mining Intelligence and Knowledge Exploration, Hyderabad, pp. 694-702, 2015.

[26] Seshadri S., Madasamy A., and Padannayil S., “Article Analyzing Sentiment in Indian Languages Micro Text,” Institute of Integrative Omics and Applied Biotechnology, vol. 7, no. 1, pp. 313-318, 2016.

[27] Sharma P. and Moh T., “Prediction of Indian election using sentiment analysis on Hindi Twitter,” in Proceedings of IEEE International Conference on Big Data, Washington, pp. 1966- 1971, 2016.

[28] Sharma R. and Bhattacharyya P., “A Sentiment Analyzer for Hindi Using Hindi Senti Lexicon,” in Proceedings of the 11th International Conference on Natural Language Processing, Goa, pp. 150-155, 2014.

[29] Shirani-mehr H., “Applications of Deep Learning to Sentiment Analysis of Movie Reviews,” Technical Report, Stanford University, 2015.

[30] Singh S. and Siddiqui T., “Utilizing Corpus Statistics for Hindi Word Sense Disambiguation,” The International Arab Journal of Information Technology, vol. 12, no. 6A, pp. 755-763, 2015.

[31] Stojanovski D., Strezoski G., Madjarov G., and Dimitrovski I., “Twitter sentiment analysis using Deep Convolutional Neural Network,” in Proceedings of International Conference on Hybrid Artificial Intelligence Systems., Bilbao, 964 The International Arab Journal of Information Technology, Vol. 17, No. 6, November 2020 pp. 726-737, 2015.

[32] Tang D., Qin B., and Liu T., “Document Modeling with Gated Recurrent Neural Network for Sentiment Classification,” in Proceedings of Conference on Empirical Methods in Natural Language Processing, Lisbon, pp. 1422-1432, 2015.

[33] Tang Y. and Liu J., “Gated Recurrent Units for Airline Sentiment Analysis of Twitter Data,” Technical Report, Stanford University, 2011.

[34] Tumsare P., Sambare A., and Jain S., “Opinion Mining In Natural Language Processing Using Sentiwordnet and Fuzzy,” International Journal of Emerging Trends and Technology in Computer Science, vol. 3, no. 3, pp. 153-158, 2014.

[35] Zahedi M. and Sorkhi A., “Improving Text Classification Performance Using PCA and Recall-Precision Criteria,” Arabian Journal for Science and Engineering, vol. 38, no. 8, pp. 2095-2102, 2013.

[36] Zhang L. and Chen C., “Sentiment Classification With Convolutional Neural Networks: an Experimental Study on A Large-Scale Chinese Conversation Corpus,” in Proceedings of 12th International Conference on Computational Intelligence and Security, Wuxi, pp. 165-169, 2017. Kush Shrivastava is pursuing a Ph.D. at Jaypee University of Engineering and Technology, Guna, M.P, India. Before this, he has completed MTech in Computer Science Engineering from Jaypee University of Engineering and Technology, Guna, M.P., India. Shishir Kumar is working as a Professor in the Department of Computer Science and Engineering at Jaypee University of Engineering and Technology, Guna, M.P., India. He earned a Ph.D. in Computer Science in 2005. He has twenty-one years of teaching experience in various organizations of repute for PG and UG courses of Computer Science and IT.