The Impact of Natural Language Preprocessing on Big Data Sentiment Analysis

The sentiment analysis determines peoples’ opinions, sentiments and emotions by classifying their written text into positive or negative polarity. The sentiment analysis is important for many critical applications such as decision making and products evaluation. Social networks are one of the main sources of sentiment analysis. However, the huge volume of data produced by social networks requires efficient and scalable analysis techniques to be applied. The MapReduce proved its efficiency and scalability in handling big data, thus attracted many researchers to use the MapReduce as a processing framework. In this paper, a sentiment analysis method for big data is studied. The method uses the Naïve Bayes algorithm for classifying texts into positive and negative polarity. Several linguistic and Natural Language Processing (NLP)preprocessing techniques are applied on a Twitter data set, to study their impact on the accuracy of big data classification. The preformed experiments indicates that the accuracy of the sentiment analysis is enhanced by 5%, yielding an accuracy of 73% on the Stanford Sentiment data set.

[22] White T., Hadoop: The Definitive Guide, O'Reilly Media, 2015. The Impact of Natural Language Preprocessing on Big Data Sentiment Analysis 513 Mariam Khader is a PhD Candidate in computer science at Princess Sumaya University for Technology (PSUT), Amman, Jordan. She received the BSc degree in computer networking systems from the World Islamic Science & Education University (WISE) in 2012, Amman, Jordan. She received her MSc Degree in IT security and digital criminology in 2014 from PSUT. Between 2012-2015, she was teacher assistant and then a lecturer at the network department in WISE University. Her interests include digital forensics, network security and big data analytic. Arafat Awajan is a Full Professor at Princess Sumaya University for Technology (PSUT). He received his PhD degree in Computer Science from the University of Franche-Comte, France in 1987. He has held various administrative and academic positions at the Royal Scientific Society and Princess Sumaya University for Technology. Head of the Department of Computer Science (2000-2003) Head of the Department of Computer Graphics and Animation (2005-2006) Dean of the King Hussein School for Information Technology (2004 - 2007) Director of the Information Technology Center, RSS (2008-2010) Dean of Student Affairs (2011 - 2014) Dean of the King Hussein School for Computing Sciences (2014-2017) He is currently the vice president of the university (PSUT). His research interests include: Natural Language Processing, Arabic Text Mining and Digital Image Processing. Ghazi Al-Naymat. He received his PhD degree in May 2009 from the School of Information Technologies at The University of Sydney, Australia. He is working as an Associate Professor in the Department of Computer Science, King Hussein School of Computing Sciences at Princess Sumaya University for Technology (PSUT). In addition, he is currently the chair of the computer science department. His research interests include: Data Mining and machine learning, big data, and data science.