The International Arab Journal of Information Technology (IAJIT)


Opinion within Opinion: Segmentation Approach

In computational linguistics, sentiment analysis facilitates classification of opinion as a positive or a negative class. Urdu is a widely used language in different parts of the world and classification of the opinions given in Urdu language is as important as for any other language. The literature contains very restricted research for sentiment analysis of Urdu language and mainly Bag-of-Word model dominates the research methods used for this purpose. The Bag-of-Word based models fail to classify a subset of the complex sentiments; the sentiments with more than one opinion. However, no known literature is available which identifies and utilizes sub-opinion level information. In this paper, we proposed a method based on sub- opinions within the text to determine the overall polarity of the sentiment in Urdu language text. The proposed method classifies a sentiment in three steps, First it segments the sentiment into two fragments using a set of hypotheses. Next it calculates the orientation scores of these fragments independently and finally estimates the polarity of the sentiment using scores of the fragments. We developed a computational model that empirically evaluated the proposed method. The proposed method increases the precision by 8.46%, recall by 37.25% and accuracy by 24.75%, which is a significant improvement over the existing techniques based on Bag-of-Word model.

[1] Afraz Z., Muhammad A., and Martinez-Enriquez A., Sentiment-Annotated Lexicon Construction for an Urdu Text Based Sentiment Analyzer, Pakistan Journal of Science, vol. 63, no. 4, pp. 222-225,2011.

[2] Al-Kabi M., Al-Ayyoub M., Alsmadi I., and Wahsheh H., A Prototype for a Standard Arabic Sentiment Analysis Corpus, The International Arab Journal of Information Technology, vol. 13, no. 1A, pp. 163-170, 2016.

[3] Annett M. and Kondrak G., A comparison of Sentiment Analysis Techniques: Polarizing Movie Blogs, in proccedings of the Canadian Society for Computational Studies of Intelligence, Windsor, pp. 25-35, 2008.

[4] Asher N., Benamara F., and Mathieu Y., Appraisal of Opinion Expressions in Discourse, Lingvistic Investigationes, vol. 32, no. 2, pp. 279-292, 2009.

[5] Asher N., Benamara F., and Mathieu Y., Distilling Opinion in Discourse: A Preliminary Study, in Proceedings of International Conference on Computational Linguistics, Manchester, pp. 7-10, 2008.

[6] Bilal M., Israr H., and Shahid M., Sentiment classification of Roman-Urdu Opinions using Na ve Bayesian, Decision Tree and KNN Classification Techniques, Journal of King Saud University-Computer and Information Sciences, vol. 28, no. 3, pp. 330-344, 2015.

[7] Bilgic M., Namata G., and Getoorl L., Combining Collective Classification and link Prediction, in Proccedings of International Conference on Data Mining Workshops, Omaha, pp. 381-386, 2007.

[8] Irvine A., Weese J., and Burch C., Processing Informal Romanized Pakistani Text Messages, in Proceedings of the Second Workshop on Language in Social Media, Montreal, pp. 75-78, 2012.

[9] Liu B., Sentiment Analysis and Opinion Mining, Morgan and Claypool Publishers, 2012.

[10] Mukherjee S. and Bhattacharyya P., Sentiment Analysis in Twitter with Lightweight Discourse Analysis, in Proceedings of 24th International Conference on Computational Linguistics, Bombay, pp. 1847-1864, 2012.

[11] Mukund S. and Srihari R., Analyzing Urdu Social Media for Sentiments Using Transfer Learning with Controlled Translations, in Proceedings of the Second Workshop on Language in Social Media, Montreal, pp. 1-8, 2012.

[12] Somasundaran S., Namata G., Getoorl L., and Wiebe J., Opinion Graphs for Polarity and Discourse Classification, in Proceedings of the Workshop on Graph-based Methods for Natural Language Processing, Suntec, pp. 66-74, 2009.

[13] Somasundaran S., Ruppenhofer J., and Wiebe J., Discourse level Opinion Relations: An Annotation Study, in Proceedings of the 9th Sigdial Workshop on Discourse and Dialogue, Stroudsburg, pp. 129-137, 2008.

[14] Syed A., Aslam M., and Enriquez A., Lexicon based Sentiment Analysis of Urdu Text Using SentiUnits, in Proceedings of Advances in Artificial Intelligence, Pachuca, pp. 32-43, 2010. 28 The International Arab Journal of Information Technology, Vol. 15, No. 1, January 2018

[15] Syed A., Aslam M., and Martinez-Enriquez A., Sentiment Analysis of Urdu Language: Handling Phrase-level Negation, Advances in Artificial Intelligence, Puebla, pp. 382-393, 2011.

[16] Taboada M., Voll K., and Brooke J., Extracting Sentiment as a Function of Discourse Structure and Topicality, Technical Report School of Computing Science Simon Fraser University, 2008.

[17] Thelwall M., Backley K., Paltoglou G., Cai D., and Kappas A., Sentiment Strength Detection in Short Informal Text, Journal of the American Society for Information Science and Technology, vol. 61, no. 12, pp. 2544-2558, 2010.

[18] Turney P. and Littman M., Measuring praise and criticism: Inference of Semantic Orientation from Association, ACM Transactions on Information Systems, vol. 21, no. 4, pp. 315-346, 2003.

[19] Wiebe J., Wilson T., Bruce R., bell M., and Martin M., Learning Subjective Language, Computational Linguistics, vol. 30, no. 3, pp. 277-308, 2004.

[20] Zhou L., Li B., Gao W., Wei Z., and Wong K., Unsupervised Discovery of Discourse Relations for Eliminating Intra-Sentence Polarity Ambiguities, in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, pp.162-171, 2011.

[21] Zirn C., Niepert M., Stuckenschmidt H., and Trubem S., Fine-Grained Sentiment Analysis with Structural Features, in Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, pp. 336-344, 2011. Muhammad Hassan is assisstant professor at Computer Sciecne and Engineering Department at the University of Engineering and Tecnhology. He is Gold Medalist of Punjab University. He has completed his MS from UET lahore and Currently doing Ph.d from the same university. His research interest includes Natural Language Processing, Semantic Web, Software Arechitecture and Open source Software Development. Muhammad Shoaib is a professor at Computer Science and Engineering Department at the University of Engineering and Technology Lahore, Pakistan. He received his MSc in computer science from Islamia University,Pakistan. He has completed his PhD from the University of Engineering and Technology, Pakistan in 2006. His Post Doc. is from Florida Atlantic University, USA, in 2009. His current research interests include information retrieval systems, information systems, software engineering and semantic web.