Predicting the Winner of Delhi Assembly Election, 2015 from Sentiment Analysis on Twitter Data-A

Author BigData Perspective,

Keywords #Election winner prediction #big data #sentiment analysis #tweet mining #map reduce

Abstract Social media is currently a place where people create and share contents at a massive rate. Because of its ease of use, speed and reach, it is fast changing the public discourse in society and setting trends and agendas in different topics including environment, politics technology, entertainment etc. As it is a form of collective wisdom, we decided to investigate its power at predicting real-world outcomes. The objective was to design a Twitter-based sentiment mining. We introduce a keyword-aware user-based collective tweet mining approach to rank the sentiment of each user. To prove the accuracy of this method, we chose an Election Winner Prediction application and observed how the sentiments of people on different political issues at that time got reflected in their votes. A Domain thesaurus is built by collecting keywords related to each issue. Twitter data being huge in size and difficult to process, we use a scalable and efficient Map Reduce programming model-based approach, to classify the tweets. The experiments were designed to predict the winner of Delhi Assembly Elections 2015, by analyzing the sentiments of people on political issues and from this analysis, we accurately predicted that Aam Admi Party has a higher support, compared to Bharathiya Janatha Party (BJP), the ruling party. Thus, a Big Data Approach that has widespread applications in today’s world, is used for sentiment analysis on Twitter data.

References

[1] Agrawal A., Biadsy F., and Mckeown K., “Contextual Phrase-Level Polarity Analysis using Lexical Affect Scoring and Syntactic N-Grams,” in Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Athens, pp. 24-32, 2009.

[2] Al-Kabi M., Al-Ayyoub M., Alsmadi I., and Wahsheh H., “A Prototype for a Standard Arabic Sentiment Analysis Corpus,” The International Arab Journal of Information Technology, vol. 13, no. 1A, pp. 163-170, 2016.

[3] Amdahl G., “Validity of the Single-Processor Approach to Achieving Large Scale Computing Capabilities,” in Proceedings of Spring Joint Computer Conference, Atlantic, pp. 483-485, 1967.

[4] Beyer M. and Laney D., “The Importance of 'Big Data': A Definition,” Technical Report, Gartner, 2012.

[5] Kang G., Liu J., Tang M., and Liu X., “AWSR: Active Web Service Recommendation Based on Usage History,” in Proceedings of the 19th IEEE International Conference on Web Services, Honolulu, pp. 186-193, 2012.

[6] Chu A., Kalaba R., and Spingarn K., “A Comparison of Two Methods for Determining the Weights of Belonging to Fuzzy Sets,” Journal of Optimization Theory and Applications, vol. 27, no. 4, pp. 531-538, 1979.

[7] Gayo-Avello D., “A Meta-Analysis of State-of- the Art Electoral Prediction from Twitter Data,” Social Science Computer Review, vol. 31, no. 6, pp. 649-679, 2013.

[8] GayoAvello D., “Don't Turn Social Media into another 'LiteraryDigest' Poll,” Communications of the ACM, vol. 54, no. 10, pp. 121-128, 2011.

[9] Gimpel K., Schneider N., O’Connor B., Das D., Mills D., Eisenstein J., Heilman M., Yogatama D., Flanigan J., and Smith N., “Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, pp. 42-47, 2011.

[10] Go A., Bhayani R., and Huang L., “Twitter Sentiment Classification using Distant Supervision,” Technical Report, Stanford University, 2009.

[11] Kaufmann M., “Syntactic Normalization of Twitter Messages,” in Proceedings of the International Conference on Natural Language Processing, Kharagpur, 2010.

[12] Kim S. and Hovy E., “Determining the Sentiment of Opinions,” in Proceedings of the 20th International Conference on Computational Linguistics, Geneva, 2004.

[13] Lakiotaki K., Matsatsinis N., and Tsoukis A., “Multi-Criteria User Modeling in Recommender Systems,” IEEE Intelligent Systems, vol. 26, no. 2, pp. 64-76, 2011.

[14] Liu B. and Hu M., “Mining and Summarizing Customer Reviews,” in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, pp. 168-177, 2004.

[15] Metaxas P., Mustafaraj E., and Gayo-Avello D., “How (not) to Predict Elections, In Privacy, Security, Risk and Trust (PASSAT),” in Proceedings of IEEE 3rd International Conference on Social Computing (SocialCom), Boston, pp. 165-171, 2011.

[16] Mohan L. and Elayidom S., “Who Will Win Delhi Election-Prediction by HIVE,” in Proceedings of National Conference on Adaptive Techniques in Engineering and Technology, India, 2015.

[17] O'Connor B., Balasubramanyan R., Routledge B., and Smith N., “From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series,” in Proceedings of the 4th International 842 The International Arab Journal of Information Technology, Vol. 16, No. 5, September 2019 AAAI Conference on Weblogs and Social Media, Washington, pp. 122- 129, 2010.

[18] Pak A. and Paroubek P., “Twitter as a Corpus for Sentiment Analysis and Opinion Mining,” in Proceedings of the International Conference on Language Resources and Evaluation, Valletta, pp. 17-23, 2010.

[19] Pan Y. and Lee L., “Performance Analysis for Lattice-Based Speech Indexing Approaches using Words and Subword Units,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 6, pp. 1562-1574, 2010.

[20] Pang B. and Lee L., “A Sentimental Education: Sentiment Analysis using Subjectivity Analysis using Subjectivity Summarization Based on Minimum Cuts,” in Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Barcelona, pp. 271- 278, 2004.

[21] Saaty T., “Decision Making with the Analytic Hierarchy Process,” International Journal of Services Sciences, vol. 1, no. 1, pp. 83-98, 2008.

[22] Song M., Kim M., and Jeong Y., “Analyzing the Political Landscape of 2012 Korean Presidential Election in Twitter,” IEEE Intelligent Systems, Special Issue on Social Intelligence and Technology, vol. 29, no. 2, pp. 18-26, 2014.

[23] Taboada M., Brooke J., Tifiloski M., Voll K., and Stede M., “Lexicon-Based Methods for Sentiment Analysis,” International Journal of Computer Knowledge and Technology, vol. 11, pp. 230-239, 2011.

[24] Tumasjan A., Sprenger T., Sandner P., and Welpe I., “Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment,” in Proceedings of the 4th International AAAI Conference on Weblogs and Social Media, Washington, pp. 178-185, 2010.

[25] Turney P., “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, pp. 417- 424, 2002.

[26] TweetNLP Dataset provided by Carnegie Mellon University: http://www.ark.cs.cmu.edu/TweetNLP/, Last Visited, 2017.

[27] USQL Sentiment Analysis tool by Microsoft Azure https://msdn.microsoft.com/en- us/azure/data-lake-analytics/u-sql/sentiment- analysis-u-sql, Last Visited, 2017. Lija Mohan has completed her Ph.D. in Big Data Security from School of Engineering (SOE), Department of Cochin University of Science & Technology (CUSAT) and currently working as Cyber security Specialist at Prevalent AI Pvt Ltd. She took her Masters and Bachelor Degree in Computer Science, both from Mahatma Gandhi University. She has several International publications to her credit and she is the recipient of AWS Research Grant and Inspire Fellowship. Sudheep Elayidom is working as Professor at the Computer Science Division of Cochin University of Science and Technology, Kerala, India. He received his Masters and Bachelors from Mahatma Gandhi University and Ph.D. from Cochin University of Science and Technology, all in the field of Computer Science. He has delivered keynote addresses, invited seminars, and served as session chair for international conferences and workshops. Also, he has authored several journals, books and conference articles.