The International Arab Journal of Information Technology (IAJIT)


Heterogeneous Feature Analysis on Twitter Data Set for Identification of Spam Messages

Spam is an undesirable content that present on online social networking sites, while spammers are the users who post this content on social networking sites. Unwanted messages posted on Twitter may have several goals and the spam tweets can interfere with statistics presented by Twitter mining tools and squander users’ attention.. Since Twitter has achieved a lot of attractiveness through-out the world, the interest towards it by the spammers and malevolent users is also increases. To overcome the spam problems many researchers proposed ideas using machine learning algorithms for the identification of spam messages. Not only the selection of classifiers but also the variegated feature analysis is essential for the identification of irrelevant messages in social networks. The proposed model performs a heterogeneous feature analysis on the twitter data streams for classifying the unsolicited messages using binary and continuous feature extraction with sentiment analysis on social network datasets. The features created are assessed using significant stratagems and the finest features are selected. A classifier model is built using these feature vectors to predict and identify the spam messages in Twitter. The experimental results clearly show that the proposed Sentiment Analysis based Binary and Continuous Feature Extraction model with Random Forest (SA-BC-RF) approach classifies the spam messages from the social networks with an accuracy of 90.72% when compared with the other state-of-the-art methods.

[1] Adarsh M. and Ravikumar P., “An Effective Method of Predicting the Polarity of Airline Tweets using Sentimental Analysis,” in Proceedings of the 4th International Conference on Electrical Energy Systems, Chennai, pp. 676- 679, 2018.

[2] Al-Hasan A. and El-Alfy E., “Dendritic Cell Algorithm for Mobile Phone Spam Filtering,” Procedia Computer Science, vol. 52, no. 1, pp. 244-251, 2015.

[3] Alsmadi I. and Hoon G., “Term Weighting Scheme for Short-Text Classification: Twitter Corpuses,” Journal of Neural Computing and Applications, vol. 31, pp. 1-13, 2018.

[4] Al-Qurishi M., Hossain S., Alrubaian M., Rahman S., and Alamri A., “Leveraging Analysis of User Behavior to Identify Malicious Activities in Large-Scale Social Networks,” IEEE Transactions on Industrial Informatics , vol. 14, no. 2, pp. 799-813, 2018.

[5] Antonakaki D., Fragopoulou P., and Ioannidis S., “A Survey of Twitter Research: Data Model, Graph Structure, Sentiment Analysis and Attacks,” Elsevier Journal of Expert Systems with Applications, vol. 164, pp. 114006, 2021.

[6] Barushka A. and Hajek P., “Spam Filtering using Integrated Distribution-Based Balancing Approach and Regularized Deep Neural Networks,” Applied Intelligence, vol. 48, no. 3, pp. 3538-3556, 2018.

[7] Chen C., Wang Y., Zhang J., Xiang Y., Zhou W., and Min G., “Statistical Features-Based Real- Time Detection of Drifted Twitter Spam,” IEEE Transactions on Information Forensics and Security, vol. 12, no. 4, pp. 914-925, 2017.

[8] Chen W., Yeo C., Lau C., and Lee B., “A Study on Real-Time Low-Quality Content Detection on Twitter from the Users’ Perspective,” PLOS ONE Journal, vol. 12, no. 8, pp. e0182487, 2017.

[9] Deng X., Li Y., Weng J., and Zhang J., “Feature Selection for Text Classification: A Review,” Multimedia Tools and Applications, vol. 78, no. 3, pp. 3797-3816, 2019.

[10] Dutta S., Ghatak S., Dey R., Das A., and Ghosh S., “Attribute Selection for Improving Spam Classification in Online Social Networks: A Rough Set Theory-Based Approach,” Social Network Analysis and Mining, vol. 8, no. 7, pp. 1-16, 2018.

[11] El-Alaoui I., Gahi Y., Messoussi R., Chaabi Y., Todoskoff A., and Kobi A., “A Novel Adaptable Approach for Sentiment Analysis on Big Social Data,” Journal of Big Data, vol. 5, no. 1, pp. 1-18, 2018.

[12] Fazil M. and Abulaish M., “A Hybrid Approach for Detecting Automated Spammers in Twitter,” 0 20 40 60 80 100 CB-FASA-FAProposed(BC-RF)Proposed(SA-BC-RF) Accuracy (%) Filtering Methods 44 The International Arab Journal of Information Technology, Vol. 19, No. 1, January 2022 IEEE Transactions on Information Forensics and Security, vol. 13, no. 11, pp. 2707-2719, 2018.

[13] Fiok K., Karwowski W., Gutierrez E., and Wilamowski M., “Analysis of Sentiment in Tweets Addressed to a Single Domain-Specific Twitter Account: Comparison of Model Performance and Explainability of Predictions,” Elsevier Journal on Expert Systems with Applications, vol. 186, pp. 115771, 2021.

[14] Gharge S. and Chavan M., “An Integrated Approach for Malicious Tweets Detection Using NLP,” in Proceedings of the International Conference on Inventive Communication and Computational Technology, Coimbatore, pp. 435-438, 2017.

[15] Halawi B., Mourad A., Otrok H., and Damiani E., “Few Are As Good As Many: An Ontology- Based Tweet Spam Detection Approach,” IEEE Access, vol. 6, pp. 63890-63904, 2018.

[16] Ji H. and Zhang H., “Analysis on the Content Features and Their Correlation of Web Pages for Spam Detection,” IEEE on China Communications, vol. 12, no. 3, pp. 84-94, 2015.

[17] Kiliroor C. and Valliyammai C., “Binary and Continuous Feature Engineering Analysis on Twitter Data Stream for Classification of Spam Messages,” in Proceedings of 2nd International Conference on Communication, Devices and Computing, pp. 581-594, 2019.

[18] Lin G., Sun N., Nepal S., Zhang J., Xiang Y., and Hassan H., “Statistical Twitter Spam Detection Demystified: Performance, Stability and Scalability,” Big Data Analytics in Internet of Things and Cyber-Physical Systems, vol. 5, pp. 11142-11154, 2017.

[19] Liu C., Wang J., and Lei K., “Detecting Spam Comments Posted in Micro-Blogs Using Self- Extensible Spam Dictionary,” in Proceeding of the International Conference on Communications, Kuala Lumpur, pp. 1-7, 2016.

[20] Liu S., Wang Y., Zhang J., Chen C., and Xiang Y., “Addressing the Class Imbalance Problem in Twitter Spam Detection using Ensemble Learning,” Computers and Security, vol. 69, pp. 35-49, 2016.

[21] Mohammed N., “Extracting Word Synonyms from Text Using Neural Approaches,” The International Arab Journal of Information Technology, vol. 17, no. 1, pp. 45-51, 2020.

[22] Murugan N. and Devi G., “Detecting Streaming of Twitter Spam Using Hybrid Method,” Wireless Personal Communication, vol. 103, pp. 1353-1374, 2018.

[23] Rao S., Verma A., and Bhatia T., “A Review on Social Spam Detection: Challenges, Open Issues, and Future Directions,” Elsevier Journal on Expert Systems with Applications, vol. 186, pp. 115742, 2021.

[24] Schouten K. and Frasincar F., “Survey on Aspect-Level Sentiment Analysis,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 3, pp. 813-830, 2016.

[25] Shi W. and Xie M., “A Reputation-Based Collaborative Approach for Spam Filtering,” AASRI Procedia, vol. 5, pp. 220-227, 2013.

[26] Vanetti M., Binaghi E., Ferrari E., Carminati B., and Carullo M., “A System to Filter Unwanted Messages from OSN User Walls,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 2, pp. 285-297, 2013.

[27] Xue B., Zhang M., Browne W., and Yao X., “A Survey on Evolutionary Computation Approaches to Feature Selection,” IEEE Transactions on Evolutionary Computation, vol. 20, no. 4, pp. 606-626, 2016.