The International Arab Journal of Information Technology (IAJIT)


Sentiment Analysis with Term Weighting and Word Vectors

It is the sentiment analysis with which it is tried to predict the sentiment being told in the texts in an area where Natural Language Processing (NLP) studies are being frequently used in recent years. In this study sentiment extraction has been made from Turkish texts and performances of methods that are used in text representation have been compared. In the study being conducted, besides Bag of Words (BoW) method which is traditionally used for the representation of texts, Word2Vec, which is word vector algorithm being developed in recent years and Doc2Vec, being document vector algorithm, have been used. For the study 5 different Machine Learning (ML) algorithms have been used to classify the texts being represented in 5 different ways on 3000 pieces of labeled tweets belonging to a telecom company. As a conclusion it was seen that Word2Vec, being among text representation methods and Random Forest, being among ML algorithms were most successful and most applicable ones. It is important as it is the first study with which BoW and word vectors have been compared for sentiment analysis in Turkish texts.

[1] Berthold M., Cebron N., Dill F., Gabriel T., Kötter T., Meinl T., and Wiswedel B., “KNIME- the Konstanz İnformation Miner: Version 2.0 And Beyond,” AcM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 26-31, 2009.

[2] Bhavitha B., Rodrigues A., and Chiplunkar N., “Comparative Study of Machine Learning Techniques in Sentimental Analysis,” in Proceedings of International Conference on Inventive Communication and Computational Technologies, Tamilnadu, pp. 216-221, 2017. 958 The International Arab Journal of Information Technology, Vol. 16, No. 5, September 2019

[3] Breiman L., “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.

[4] Bilgin M. and Şentürk İ., “Sentiment Analysis on Twitter Data with Semi-Supervised Doc2Vec,” in Proceedings of 2nd International Conference on Computer Science and Engineering, Antalya, pp. 661-666, 2017.

[5] Bilgin M. and Köktaş H., “Word2Vec Based Sentiment Analysis for Turkish Texts,” in Proceedings of International Conference on Engineering Technologies, Konya, pp. 106-109, 2017.

[6] Cutler A., Cutler D., and Stevens J., Random Forests, Ensemble Machine Learning, 2012.

[7] Çetin M. and Amasyali M., “Supervised and Traditional Term Weighting Methods for Sentiment Analysis,” in Proceedings of 21st Signal Processing and Communications Applications Conference, Haspolat, pp. 1-4, 2013.

[8] Dadgar S., Araghi M., and Farahani M., “A Novel Text Mining Approach Based on TF-IDF and Support Vector Machine for News Classification,” in Proceedings of International Engineering and Technology, Coimbature, pp. 112-116, 2016.

[9] Dickinson B. and Hu W., “Sentiment Analysis of Investor Opinions on Twitter,” Social Networking, vol. 4, no. 3, pp. 62-71, 2015.

[10] Fayyad U., Piatetsky-Shapiro G., and Smyth P., “From Data Mining to Knowledge Discovery in Databases,” Al magazine, vol. 17, no. 3, pp. 37- 54, 1996.

[11] Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., and Witten I., “The Weka Data Mining Software: An Update,” ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10- 18, 2009.

[12] Kaur A. and Gupta V., “A Novel Approach for Sentiment Analysis of Punjabi Text Using SVM,” The International Arab Journal of Information Technology, vol. 14, no. 5, pp. 707- 712, 2017.

[13] Khanna S. and Agarwal S., “An Integrated Approach towards the Prediction of Likelihood of Diabetes,” in Proceedings of International Conference on Machine Intelligence and Research Advancement, Katra, pp. 294-298, 2013.

[14] Le Q. and Mikolov T., “Distributed Representations of Sentences and Documents,” in Proceedings of International Conference on Machine Learning, Beijing, pp. 1188-1196, 2014.

[15] Mahardhika Y., Sudarsono A., and Barakbah A., “An Implementation of Botnet Dataset to Predict Accuracy Based on Network Flow Model,” in Proceedings of International Electronics Symposium on Knowledge Creation and Intelligent Computing, Surabaya, pp. 33-39, 2017.

[16] Mikolov T., Chen K., Corrado G., and Dean J., “Efficient Estimation of Word Representations in Vector Space,” in Proceedings of International Conference on Learning Representations, Arizona, pp. 1-12, 2013.

[17] Prabhat A. and Khullar V., “Sentiment Classification on Big Data Using Naïve Bayes And Logistic Regression,” in Proceedings of International Conference on Computer Communication and Informatics, Coimbatore, pp. 1-5, 2017.

[18] Polpinij J., Srikanjanapert N., and Sopon P., “Word2Vec Approach for Sentiment Classification Relating to Hotel Reviews,” in Proceedings of 13th International Conference on Computing and Information Technology, Bangkok, pp. 308-316, 2017.

[19] Raut, M. and Barve S., “A Semi-Automated Review Classification System Based on Supervised Machine Learning,” in Proceedings of 1st International Conference on Intelligent Systems and Information Management, Aurangabad, pp. 127-133, 2017.

[20] Şahin G., “Turkish Document Classification Based on Word2Vec and SVM Classifier,” in Proceedings of 25th Signal Processing and Communications Applications Conference, Antalya, pp. 1-4, 2017.

[21] Tang D., Wei F., Yang N., Zhou M., Liu T., and Qin B., “Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification,” in Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, pp. 1555-1565, 2014.

[22] Vijayan V., Bindu K., and Parameswaran L., “A Comprehensive Study of Text Classification Algorithms,” in Proceedings of International Conference on Advances in Computing, Communications and Informatics, Udupi, pp. 1109-1113, 2017.

[23] Xue B., Fu C., and Shaobin Z., “A Study on Sentiment Computing and Classification of Sina Weibo with Word2vec,” in Proceedings of IEEE International Congress on Big Data, Anchorage, pp. 358-363, 2014.

[24] Zhang D., Xu H., Su Z., and Xu Y., “Chinese Comments Sentiment Classification Based on Word2vec and Svmperf,” Expert Systems with Applications, vol. 42, no. 4, pp.1857-1863, 2015. Sentiment Analysis with Term Weighting and Word Vectors 959 Metin Bilgin received the Ph.D. degree in Computer Engineering from Yıldız Technical University in 2015. He is currently assistant professor in the Department of Computer Engineering, Bursa Uludağ University, Turkey. His current research interests include machine learning, natural language processing and text classification. Haldun Köktaş is currently pursuing MSc at Mechatronics Engineering Department in Bursa Technical University. His research interests are machine learning, Natural language processing, mechanical design of robots and active exoskeletons.