The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Mahalanobis Distance-the Ultimate Measure for

 ,
#
 In this paper, Mahalanobis Distance (MD) has been p roposed as a measure to classify the sentiment expressed in a  review document as either positive or negative. A n ew method for representing the text documents using  Representative Terms  (RT)  has  been  used.  The  new  way  of  representing  tex t  documents  using  few  representative  dimensions  is relatively  a  new  concept, which is successfully demonstrated in this  paper. The MD based classifier performed with 70.8 % of accuracy for the  experiments  carried  out  using  the  benchmark  dataset   containing  25000  movie  reviews.  The  hybrid  of  MD  b ased  Classifier  (MDC)  and  Multi  Layer  Perceptron  (MLP)  resulted  in  a  98.8%  of  classification  accuracy,  which  is  the  highest  ever  reported  accuracy for a dataset containing 25000 reviews.     


[1] Amine A., Elberrichi Z., and Simonet M., Evaluation of Text Clustering Methods using WordNet, the International Arab Journal of Information Technology , vol. 7, no. 4, pp. 349- 357, 2010.

[2] Dave K., Lawrence S., and Pennock D., Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews, in Proceedings of the 12 th International WWW Conference , Budapest, Hungary, pp. 519-528, 2003.

[3] Gamon M., Sentiment Classification on Customer Feedback Data: Noisy Data, Large Feature Vectors and the Role of Linguistic Analysis, in Proceedings of the 20 th International Conference on Computational Linguistics , Geneva, Switzerland, pp. 841-847, 2004 .

[4] Hiroshi K., Tetsuya N., and Hideo W., Deeper Sentiment Analysis using Machine Translation Technology, in Proceedings of the 20 th International Conference on Computational Linguistics , Geneva, Switzerland, pp. 494-500, 2004.

[5] Konig A. and Brill E., Reducing the Human Overhead in Text Categorization, in Proceedings of the 12 th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , Pennsylvania, USA, pp. 598-603, 2006.

[6] Liu B., Web Data Mining Exploring Hyperlinks, Contents and Usage Data , Springer , 2008.

[7] Mass A., Daly R., Pham P., Huang D., Ng A., and Potts C., Learning Word Vectors For Sentiment Analysis, in Proceedings of the 49 th Annual Meeting of the Association for Computational Linguistics , Oregon, USA, pp. 142-150, 2011.

[8] Nasukawa T. and Yi J., Sentiment Analysis: Capturing Favorability using Natural Language Processing, in Proceedings of the 2 nd International Conference on Knowledge Capture , Florida, USA, pp. 70-77, 2003.

[9] Pang B. and Lee L., A Sentiment Education: Sentiment Analysis using Subjectivity Summarization based on Minimum Cuts, in Proceedings of the 42 nd Annual Meeting of the Association for Computational Linguistics , Barcelona, Spain, pp. 271-278, 2004.

[10] Pang B. and Lee L., Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales, in Proceedings of the 43 rd Annual Meeting of the Association for Computational Linguistics , University of Michigan, USA, pp. 115-124, 2005.

[11] Prabowo R. and Thelwall M., Sentiment Analysis: A Combined Approach, the Journal of Informetrics , vol. 3, no. 2, pp. 143-157, 2009.

[12] Srinivasagupta N., Valarmathi B., and Joseph S., Sentiment Analysis using Representative Terms-a Grouping Approach for Binary Classification of Documents, the Journal of Theoretical and Applied Information Technology , vol. 44, no. 2, pp. 161-165, 2012.

[13] Taguchi G. and Jugulum R., The Mahalanobis- Taguchi Strategy: A Pattern Technology System , John Wiley and Sons, 2002.

[14] Turney P., Thumbs up or Thumbs Down? Sentiment Orientation Applied to Unsupervised Classification of Reviews, in Proceedings of the 40 th annual meeting of the Association for Computational Linguistics , Philadelphia, USA, pp. 417-424, 2002.

[15] Valarmathi B. and Palanisamy V., Opinion Mining of Customer Reviews using Mahalanobis-Taguchi System, the European Journal of Scientific Research , vol. 62, no. 1, pp. 95-100, 2011.

[16] Yi J., Nasukawa T., Niblack W., and Bunescu R., Sentiment Analyzer: Extracting Sentiments about a Given Topic using Natural Language Processing Techniques, in Proceedings of the 3 rd IEEE International Conference on Data Mining , Florida, USA, pp. 427-434, 2003. Mahalanobis Distance-the Ultimate Measure for Sentiment Analysis 257 Valarmathi Balasubramanian is Associate Professor in the school of Information Technology at VIT University, India. She holds PhD degree in computer science from Anna University, India. Valarmathi has more than 20 years of experience in teaching and research. Currently, she supervises five PhD students in big-data analysis, sentiment mining and pattern recognition. She has coauthored a text book on total quality management. Srinivasa Gupta Nagarajan is Assistant Professor in the school of Mechanical and Building Sciences at VIT University, Vellore, India. He holds MTech degree in Industrial Management from IIT Madras, India, and currently pursuing research in the area of application of clustering tools for manufac turing management. Srinivasa has more than 17 years of experience in teaching and industrial consultancy. He is a certified six sigma Black Belt and a certified a Master Trainer. He has written a text book on Total Qualit y Management. Palanisamy Veerappagoundar is Principal, Info Institute of Engineering, Coimbatore, India. Palanisamy has more than 40 years of experience in teaching and research. He holds a PhD in communication-antenna theory from Indian Institute of Technology, India. Currently, h e supervises eight PhD students in big-data analysis, sentiment mining, and pattern recognition.