Downloads 729

..............................

Views 2k

..............................

Cited by

..............................

Received date December 28, 2021

Accepted date September 8, 2022

A Novel Spam Classification System for E-Mail Using a Gradient Fuzzy Guideline-Based Spam Classifier (GFGSC)

Author Vinoth Narayanan Arumugam Subramaniam, Rajesh Annamalai,

Keywords #Spam e-mail #principal component analysis #latent semantic analysis #information gain #chi-square #gradient fuzzy guideline-based spam classifier #MATLAB tool

Abstract

Spam messages have increased dramatically in recent years even as the number of email clients has grown. Email has already become a valuable way of communicating because it saves time and effort. However, numerous emails contain unwelcome content known as spam as a result of social platforms and advertisements. Despite the fact that many techniques have already been created for spam mails categorization, none of them achieves 100 percent efficiency in analyzing spam messages. So, in this research, we propose a novel Gradient Fuzzy Guideline-based Spam Classifier (GFGSC) for classifying the spam e-mails as spam or non-spam. This research uses four types of datasets and these datasets are pre-processed using normalization. Then the set of data can be extracted using Principal Component Analysis (PCA) and Latent Semantic Analysis (LSA) techniques. The aspects are selected using Information Gain (IG) and Chi-Square (ChS) techniques. And the GFGSC classifier can be used for classifying the data as spam or non-spam with better effectiveness. Finally, the performances are examined and these metrics are matched with the existing approaches. The results are obtained using the MATLAB tool.

References

[1] Abdulhamid S., Shuaib M., Osho O., Ismaila I., and Alhassan J., “Comparative Analysis of Classification Algorithms for Email Spam Detection,” International Journal of Computer Network and Information Security, vol. 10, no. 1, pp. 60-67, 2018.

[2] Altaher A., “Phishing Websites Classification Using Hybrid SVM and KNN Approach,” International Journal of Advanced Computer Science and Applications, vol. 8, no. 6, pp. 90-95, 2017.

[3] Bhuiyan H., Ashiquzzaman A., Juthi T., Biswas S., and Ara J., “A Survey of Existing E-Mail Spam Filtering Methods Considering Machine Learning Techniques,” Global Journal of Computer Science and Technology, vol. 18, no. C2, pp. 21- 29, 2018.

[4] Cohen A., Nissim N., and Elovici Y., “Novel Set of General Descriptive Features for Enhanced Detection of Malicious Emails Using Machine Learning Methods,” Expert Systems with Applications, vol. 110, pp. 143-169, 2018.

[5] Douzi S., AlShahwan F., Lemoudden M., and Ouahidi B., “Hybrid Email Spam Detection Model Using Artificial Intelligence,” International Journal of Machine Learning and Computing, vol. 10, no. 2, pp. 316-322, 2020.

[6] Faris H., Al-Zoubi A., Heidari A., Aljarah I., Mafarja M., Hassonah M., and Fujita H., “An Intelligent System for Spam Detection and Identification of the Most Relevant Features Based on Evolutionary Random Weight Networks,” Information Fusion, vol. 48, pp. 67- 83, 2019.

[7] Ghaleb S., Mumtazimah M., Fadzli S., and Ghanem W., “Training Neural Networks by Enhance Grasshopper Optimization Algorithm for Spam Detection System,” IEEE Access, vol. 9, pp. 116768-116813, 2021.

[8] Gibson S., Issac B., Zhang L., and Jacob S., “Detecting Spam Email with Machine Learning Optimized with Bio-Inspired Metaheuristic Algorithms” IEEE Access, vol. 8, pp. 116768- 116813, 2021.

[9] Gupta V., Mehta A., Goel A., Dixit U., and Pandey A., Harmony Search and Nature Inspired Optimization Algorithms: Theory and Applications, Springer, 2019.

[10] Gutierrez C., Kim T., Della Corte R., Avery J., Goldwasser D., Cinque M., and Bagchi S., “Learning from the Ones that Got Away: Detecting New Forms of Phishing Attacks,” IEEE Transactions on Dependable and Secure Computing, vol. 15, no. 6, pp. 988-1001, 2018.

[11] Jain V., Kapoor R., Gulyani S., and Dubey A., “Categorization of Spam Images and Identification of Controversial Images on Mobile Phones Using Machine Learning and Predictive Learning,” Journal of Discrete Mathematical Sciences and Cryptography, vol. 22, no. 2, pp. 293-307, 2019.

[12] Kontsewaya Y., Antonov E., and Artamonov A., “Evaluating the Effectiveness of Machine Learning Methods for Spam Detection,” Procedia Computer Science, vol. 190, pp. 479-486, 2021.

[13] Kumaresan T. and Palanisamy C., “E-Mail Spam Classification Using S-cuckoo Search and Support Vector Machine,” International Journal of Bio- Inspired Computation, vol. 9, no. 3, pp. 142-156, 2017.

[14] Li W., Meng W., Tan Z., and Xiang Y., “Design of Multi-view-based Email Classification for IOT Systems via Semi-supervised Learning,” Journal of Network and Computer Applications, vol. 128, pp. 56-63, 2019.

[15] Madisetty S. and Desarkar M., “A Neural Network-Based Ensemble Approach for Spam Detection in Twitter,” IEEE Transactions on Computational Social Systems, vol. 5, no. 4, pp. 973-984, 2018.

[16] Maroofi S., Korczyński M., Hölzel A., and Duda A., “Adoption of Email Anti-Spoofing Schemes: A Large-Scale Analysis Technique,” Applied 406 The International Arab Journal of Information Technology, Vol. 20, No. 3, May 2023 Machine Learning for Smart Data Analysis in IEEE Transactions on Network and Service Management, vol. 18, no. 3, pp. 3184-3196, 2021.

[17] Mohammed M., Ibrahim D., and Salman A., “Adaptive Intelligent Learning Approach Based on Visual Anti-spam Email Model for Multi- Natural Language,” Journal of Intelligent Systems, vol. 30, no. 1, pp. 774-792, 2021.

[18] Nagwani N., “A Bi-Level Text Classification Approach for SMS Spam Filtering and Identifying Priority Messages,” The International Arab Journal of Information Technology, vol. 14, no. 4, pp. 473-480, 2016.

[19] Rastenis J., Ramanauskaitė S., Suzdalev I., Tunaitytė K., Janulevičius J., and Čenys A., “Multi-Language Spam/Phishing Classification by Email Body Text: Toward Automated Security Incident Investigation,” Electronics, vol. 10, no. 6, pp. 668, 2021.

[20] Rehman A., Javed K., and Babri H., “Feature Selection Based on a Normalized Difference Measure for Text Classification,” Information Processing and Management, vol. 53, no. 2, pp. 473-489, 2017.

[21] Sah U. and Parmar N., “An Approach for Malicious Spam Detection in Email with Comparison of Different Classifiers,” International Research Journal of Engineering and Technology, vol. 4, no. 8, pp. 2238-2242, 2017.

[22] Venkatraman S., Surendiran B., and Arun Raj Kumar P., “Spam E-mail Classification for the Internet of Things Environment Using Semantic Similarity Approach,” The Journal of Supercomputing, vol. 76, no. 2, pp. 756-776, 2020.

[23] Zamir A., Khan H., Mehmood W., Iqbal T., and Akram, A., “A Feature-centric Spam Email Detection Model Using Diverse Supervised Machine Learning Algorithms,” The Electronic Library, vol. 38, no. 3, pp. 633-657, 2020.