..............................
..............................
..............................
Mahalanobis Distance-the Ultimate Measure for
In this paper, Mahalanobis Distance (MD) has been p roposed as a measure to classify the sentiment expressed in a
review document as either positive or negative. A n ew method for representing the text documents using Representative Terms
(RT) has been used. The new way of representing tex t documents using few representative dimensions is relatively a new
concept, which is successfully demonstrated in this paper. The MD based classifier performed with 70.8 % of accuracy for the
experiments carried out using the benchmark dataset containing 25000 movie reviews. The hybrid of MD b ased Classifier
(MDC) and Multi Layer Perceptron (MLP) resulted in a 98.8% of classification accuracy, which is the highest ever reported
accuracy for a dataset containing 25000 reviews.
[1] Amine A., Elberrichi Z., and Simonet M., Evaluation of Text Clustering Methods using WordNet, the International Arab Journal of Information Technology , vol. 7, no. 4, pp. 349- 357, 2010.
[2] Dave K., Lawrence S., and Pennock D., Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews, in Proceedings of the 12 th International WWW Conference , Budapest, Hungary, pp. 519-528, 2003.
[3] Gamon M., Sentiment Classification on Customer Feedback Data: Noisy Data, Large Feature Vectors and the Role of Linguistic Analysis, in Proceedings of the 20 th International Conference on Computational Linguistics , Geneva, Switzerland, pp. 841-847, 2004 .
[4] Hiroshi K., Tetsuya N., and Hideo W., Deeper Sentiment Analysis using Machine Translation Technology, in Proceedings of the 20 th International Conference on Computational Linguistics , Geneva, Switzerland, pp. 494-500, 2004.
[5] Konig A. and Brill E., Reducing the Human Overhead in Text Categorization, in Proceedings of the 12 th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , Pennsylvania, USA, pp. 598-603, 2006.
[6] Liu B., Web Data Mining Exploring Hyperlinks, Contents and Usage Data , Springer , 2008.
[7] Mass A., Daly R., Pham P., Huang D., Ng A., and Potts C., Learning Word Vectors For Sentiment Analysis, in Proceedings of the 49 th Annual Meeting of the Association for Computational Linguistics , Oregon, USA, pp. 142-150, 2011.
[8] Nasukawa T. and Yi J., Sentiment Analysis: Capturing Favorability using Natural Language Processing, in Proceedings of the 2 nd International Conference on Knowledge Capture , Florida, USA, pp. 70-77, 2003.
[9] Pang B. and Lee L., A Sentiment Education: Sentiment Analysis using Subjectivity Summarization based on Minimum Cuts, in Proceedings of the 42 nd Annual Meeting of the Association for Computational Linguistics , Barcelona, Spain, pp. 271-278, 2004.
[10] Pang B. and Lee L., Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales, in Proceedings of the 43 rd Annual Meeting of the Association for Computational Linguistics , University of Michigan, USA, pp. 115-124, 2005.
[11] Prabowo R. and Thelwall M., Sentiment Analysis: A Combined Approach, the Journal of Informetrics , vol. 3, no. 2, pp. 143-157, 2009.
[12] Srinivasagupta N., Valarmathi B., and Joseph S., Sentiment Analysis using Representative Terms-a Grouping Approach for Binary Classification of Documents, the Journal of Theoretical and Applied Information Technology , vol. 44, no. 2, pp. 161-165, 2012.
[13] Taguchi G. and Jugulum R., The Mahalanobis- Taguchi Strategy: A Pattern Technology System , John Wiley and Sons, 2002.
[14] Turney P., Thumbs up or Thumbs Down? Sentiment Orientation Applied to Unsupervised Classification of Reviews, in Proceedings of the 40 th annual meeting of the Association for Computational Linguistics , Philadelphia, USA, pp. 417-424, 2002.
[15] Valarmathi B. and Palanisamy V., Opinion Mining of Customer Reviews using Mahalanobis-Taguchi System, the European Journal of Scientific Research , vol. 62, no. 1, pp. 95-100, 2011.
[16] Yi J., Nasukawa T., Niblack W., and Bunescu R., Sentiment Analyzer: Extracting Sentiments about a Given Topic using Natural Language Processing Techniques, in Proceedings of the 3 rd IEEE International Conference on Data Mining , Florida, USA, pp. 427-434, 2003. Mahalanobis Distance-the Ultimate Measure for Sentiment Analysis 257 Valarmathi Balasubramanian is Associate Professor in the school of Information Technology at VIT University, India. She holds PhD degree in computer science from Anna University, India. Valarmathi has more than 20 years of experience in teaching and research. Currently, she supervises five PhD students in big-data analysis, sentiment mining and pattern recognition. She has coauthored a text book on total quality management. Srinivasa Gupta Nagarajan is Assistant Professor in the school of Mechanical and Building Sciences at VIT University, Vellore, India. He holds MTech degree in Industrial Management from IIT Madras, India, and currently pursuing research in the area of application of clustering tools for manufac turing management. Srinivasa has more than 17 years of experience in teaching and industrial consultancy. He is a certified six sigma Black Belt and a certified a Master Trainer. He has written a text book on Total Qualit y Management. Palanisamy Veerappagoundar is Principal, Info Institute of Engineering, Coimbatore, India. Palanisamy has more than 40 years of experience in teaching and research. He holds a PhD in communication-antenna theory from Indian Institute of Technology, India. Currently, h e supervises eight PhD students in big-data analysis, sentiment mining, and pattern recognition.