The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Using Machine Learning Techniques for Subjectivity Analysis based on Lexical and Non-lexical Features

Machine learning techniques have been used to address various problems and classification of documents is one of the main applications of such techniques. Opinion mining has emerged as an active research domain due to its wide range of applications such as multi-document summarization, opinion mining of documents and users’ reviews analysis improving answers of opinion questions in forums. Existing works classify the documents using lexicon-based features only. In this work, four state of the art machine learning techniques have been applied to classify the content into subjective and objective. The subjective content contains opinionative information while objective content contains factual information. The main contribution lies in the introduction of non-lexical features and content based features in addition to the use of a conventional lexicon based feature set. We compare results of four machine learning techniques and discuss performance in diverse categories of lexical and non-lexical features. The comparative analysis has been accomplished using standard performance evaluation measures and experiments have been performed on a real-world dataset of the online forum related to diverse topics. It has been proven that proposed content and non-lexical thread specific features play their role in the classification of subjective and non-subjective content.

 


[1] Abbott R., Walker M., Anand P., Tree J., Bowmani R., and King J., “How Can You Say Such Things?: Recognizing Disagreement in Informal Political Argument,” in Proceeding of Workshop on Languages in Social Media, Oregon, pp. 2-11, 2011.

[2] Abdul-Mageed M., Diab M., and Kubler S., “SAMAR: Subjectivity And Sentiment Analysis For Arabic Social Media,” Computer Speech and Language, vol. 28, no. 1, pp. 20-37, 2014.

[3] Abdulla S., Ramadass S., Altaher A., and Al- Nassiri A., “Employing Machine Learning Algorithms to Detect Unknown Scanning and Email Worms,” The International Arab Journal of Information Technology, vol. 11, no. 2, pp. 140-148, 2014.

[4] Abu Hammad A. and El-Halees A., “An Approach for Detecting Spam in Arabic Opinion Reviews,” The International Arab Journal of Information Technology, vol. 12, no. 1, pp. 9-16, 2015.

[5] Banea C., Mihalcea R., and Wiebe J., “Sense- Level Subjectivity In A Multilingual Setting,” Computer Speech and Language, vol. 28, no. 1, pp. 7-19, 2014.

[6] Biyani P., Bhatia S., Caragea C., and Mitra P., “Using Non-Lexical Features for Identifying Factual and Opinionative Threads in Online Forums,” Knowledge-Based Systems, vol. 69, pp. 170-178, 2014.

[7] Biyani P., Caragea C., Singh A., and Mitra P., “I Want What I Need!: Analyzing Subjectivity of Online Forum Threads,” in Proceeding of 21st ACM International Conference on Information and Knowledge Management, Hawaii, pp. 2495- 2498, 2012.

[8] Bruce R. and Wiebe J., “Recognizing Subjectivity: A Case Study in Manual Tagging,” Natural Language Engineering, vol. 5, no. 2, pp. 187-205, 1999.

[9] Cambria E., Schuller B., Xia Y., and Havasi, C., “New Avenues in Opinion Mining and Sentiment Analysis,” IEEE Intelligent Systems, vol. 28, no. 2, pp. 15-21, 2013.

[10] Duan H. and Zhai C., “Exploiting Thread Structures to Improve Smoothing of Language Models for Forum Post Retrieval,” in Proceeding of 33rd European Conference on Advances in Information Retrieval, Dublin, pp. 350-361, 2011.

[11] Fabbrizio G., Aker A., and Gaizauskas R., “Summarizing Online Reviews Using Aspect Rating Distributions and Language Modeling,” IEEE Intelligent Systems, vol. 28, no. 3, pp. 28- 37, 2013.

[12] Gangemi A., Presutti V., and Recupero D., “Frame-Based Detection of Opinion Holders and Topics: A Model and a Tool,” IEEE Computational Intelligence Magazine, vol. 9, no. 1, pp. 20-30, 2014.

[13] Hai Z., Chang K., Kim J., and Yang C., “Identifying Features in Opinion Mining via Intrinsic and Extrinsic Domain Relevance,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 3, pp. 623-634, 2013.

[14] Hassan A., Qazvinian V., and Radev D., “What's with the Attitude?: Identifying Sentences with Attitude in Online Discussions,” in Proceeding of Conference on Empirical Methods in Natural Language Processing, Massachusetts, pp. 1245- 1255, 2010.

[15] Lavanya S. and Varthini B., “Sentiment Classification of Web Opinion Documents,” in Proceeding of International Conference in Electronics and Communication Systems, Coimbatore, pp. 1-5, 2014.

[16] Liu B., “Sentiment Analysis and Subjectivity,” Handbook of Natural Language Processing, vol. 2, pp. 627-666, 2010.

[17] Maynard D., Bontcheva K., and Rout D., “Challenges in Developing Opinion Mining Tools For Social Media,” in Proceeding of @NLP Can U Tag User Generated Content?!Via Lrec-Conf.Org, Istanbul, pp. 15-22, 2012.

[18] Mihalcea R., Banea C., and Wiebe J., “Learning Multilingual Subjective Language via Cross- Lingual Projections,” in Proceeding of 45th Annual Meeting of the Association of Using Machine Learning Techniques for Subjectivity Analysis based on Lexical and Non-lexical Features 487 Computational Linguistics, Prague, pp. 976-983, 2007.

[19] Pang B. and Lee L., “Opinion Mining and Sentiment Analysis,” Foundations and Trends in Information Retrieval, vol. 2, no. 1-2, pp. 1-135, 2008.

[20] Pang B., Lee L., and Vaithyanathan S., “Thumbs up? Sentiment Classification using Machine Learning Techniques,” in Proceeding of Conference on Empirical Methods in Natural Language Processing, Philadelphia, pp. 79-86, 2002.

[21] Prabowo R. and Thelwall M., “Sentiment Analysis: A Combined Approach,” Journal of Informetrics, vol. 3, no. 2, pp. 143-157, 2009.

[22] Seerat B. and Azam F., “Opinion Mining: Issues and Challenges (A survey),” International Journal of Computer Applications, vol. 49, no. 9, pp. 42-51, 2012.

[23] Su F. and Markert K., “From Words to Senses: A Case Study of Subjectivity Recognition,” in Proceeding of 22nd International Conference on Computational Linguistics, Manchester, pp. 825- 832, 2008.

[24] Bhatia S., Biyani P., and Mitra P., “Classifying User Messages For Managing Web Forum Data,” in Proceeding of 15th International Workshop on the Web and Databases, Scottsdale, pp. 13-18, 2012.

[25] Walker M., Anand P., Abbott R., Tree J., Martell C., and King J., “That is Your Evidence?: Classifying Stance in Online Political Debate,” Decision Support Systems, vol. 53, no. 4, pp. 719-729, 2012.

[26] Weninger T., Zhu X., and Han J., “An Exploration of Discussion Threads in Social News Sites: A Case Study of the Reddit Community,” in Proceeding of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Ontario, pp. 579- 583, 2013.

[27] Wiebe J. and Riloff E., “Creating Subjective and Objective Sentence Classifiers from Unannotated Texts,” in Proceeding of 6th International Conference on Computational Linguistics and Intelligent Text Processing, Mexico, pp. 486-497, 2005.

[28] Zhai Z., Liu B., Zhang L., Xu H., and Jia P., “Identifying Evaluative Sentences in Online Discussions,” in Proceeding of the 25th AAAI Conference on Artificial Intelligence, San Francisco, pp. 933-938, 2011. Hikmat Ullah Khan is pursuing his PhD in Computer Science in department of Computer Science and Software Engineering. His research interest include Social Web Mining, Semantic Web, Use of Computers in Islamic Applications, Information Retrieval and Academic and Social Network Analysis and Mining. He has 7 research publications in journals and conferences. Ali Daud is an HEC approved Supervisor and has research interest in Topic Modeling, Data Mining, Social Network Analysis, Information Retrieval and Natural Language Processing. He is head of Data mining and Information Retrieval Group. He is an active researcher and is a regular reviewer of top conferences and impact factor journals. In his short research life, he has over 23 journal and conference publications and enjoys h-index of 7.