The International Arab Journal of Information Technology (IAJIT)


Comparative Analysis of Intrusion Detection Models using Big Data Analytics and Machine Learning Techniques

Traditional cyber security measures are becoming less effective, leading to rise in modern attacks. However, the ability to analyze and use massive volume of data (big data) to train anomaly based systems that can learn from experience, classify attacks and make decisions can improve prediction of attacks before they actually occur. In this study, to ensure availability, integrity, and confidentiality of information systems, predictive models for intrusion detection that use Big Data and Machine Learning (ML) algorithms were proposed. The proposed approach used a big dataset (CIC-Bell-IDS2017) to independently train three ML classifiers before and after feature selection. Big data analytics tool was also employed for feature scaling and selection in order to normalize data and select the most relevant set of features. Performance evaluation and comparative analysis were done and the results showed there were improvements in the models’ prediction accuracies.

[1] Ahmad Z., Khan A., Shiang C., Johari A., and Farhan A., “Network Intrusion Detection System: A Systematic Study of Machine Learning and Deep Learning Approaches,” Transactions on Emerging Telecommunications Technologies, vol. 32 no. 1, pp. 1-29, 2021.

[2] Ali A. and Abdullah M., “A Parallel Grid Optimization of SVM Hyperparameter for Big Data Classification using Spark Radoop,” Karbala International Journal of Modern Science, vol. 6, no. 1, pp. 7-18, 2020. DOI: 10.33640/2405- 609X.1270

[3] Al-Shamery A. and Al-Shamery E., “Prediction of Iraqi Stock Exchange using Optimized Based- Neural Network,” Karbala International Journal of Modern Science, vol. 7 no. 4, 2021.

[4] Angin P., Bhargava B., and Ranchal R., “Big Data Analytics for Cyber Security,” Journal of Security and Communication Networks, vol. 2019, no. 4109836, pp. 1-2, 2019. Doi: 10.1155/2019/4109836

[5] Apruzzese G., Colajanni M., Ferretti L., Guido A., and Marchetti M., “On the Effectiveness of Machine and Deep Learning for Cyber Security,” in Proceedings of the 10th International Conference on Cyber Conflict, IEEE Access Journal, Tallinn, pp. 371-389, 2018. DOI: 10.23919/CYCON.2018.8405026

[6] Balyan A., Ahuja S., Lilhore U., Sharma S., Manoharan P., and Algarni A., “A Hybrid Intrusion Detection Model Using Ega-PSO and Improved Random Forest Method,” MDPI Sensors Journal, vol. 22 no. 16, pp. 59-86, 2022.

[7] Barriga J. and Yoo S., “Malware Detection and Evasion with Machine Learning Techniques: A Survey,” International Journal of Applied Engineering Research, vol. 12, no. 18, pp. 7207- 7214, 2017. 15

[8] Beigh B. and Peer M., “Intrusion Detection and Prevention System: Classification and Quick Review,” Journal of Science and Technology, vol. 2, no. 7, pp. 661-675, 2012. 52

[9] Buchanan S., “Cyber-Attacks to Industrial Control Systems since Stuxnet: A Systematic Review,” Ph.D Thesis, Capitol Technology University ProQuest Dissertations Publishing, 2022.

[10] CCNA, Introduction to Cybersecurity, Cisco Networking Academy (NetAcad), 2018.

[11] Dasgupta D., Akhtar Z., and Sen S., “Machine Learning in Cybersecurity: A Comprehensive Survey,” Journal of Defense Modeling and Simulation, vol. 19, no. 1, pp. 1-50, 2020. doi: 10.1109/CSNT.2015.185

[12] Duc T., Leiva R., Casari P., and Ostberg P., “Machine Learning Method for Reliable Resource Provisioning in Edge-Cloud Computing: A Survey,” Association for Computing Machinery Computing Surveys Journal, vol. 52, no. 5, pp. 1- 39, 2019.

[13] Fawagreh K., Gaber M., and Elyan E., “Random Forests: From Early Developments to Recent Advancements,” Systems Science and Control Engineering Journal, vol. 2, pp. 602-609, 2014. DOI: 10.1080/21642583.2014.956265 Comparative Analysis of Intrusion Detection Models using Big Data Analytics and... 335

[14] Hariri R., Fredericks E., and Bowers K., “Uncertainty in Big Data Analytics: A Survey Opportunities, and Challenges,” Journal of Big Data, vol. 6, no. 1, pp. 44-46, 2019.

[15] Harshal K., Phadnis M., Chittar P., Zarkar K., and Bodhke B., “A Review of Data Analysis and Visualization Of Olympics Using PySpark and Dash-Plotly,” International Research Journal of Modernization in Engineering Technology and Science, vol. 4, no. 6, pp. 2093-2097, 2022.

[16] Haseeb K., Jan Z., Alzahrani F., and Jeon G., “A Secure Mobile Wireless Sensor Networks Based Protocol for Smart Data Gathering with Cloud,” Computers and Electrical Engineering Journal, vol. 97, pp. 1075-1084, 2022. 84

[17] Ingre B., Yadav A., and Soni A., “Decision Tree Based Intrusion Detection System for NSL-KDD Dataset,” in Proceedings of the International Conference on Information and Communication Technology for Intelligent Systems, pp. 207-218, Ahmedabad, 2017. 319-63645-0_23

[18] Jain, V., Machine Learning Khanna, Publishing House, 2018.

[19] Javaid A., Niyaz Q., Sun W., and Alam M., “A Deep Learning Approach for Network Intrusion Detection System,” in Proceedings of the 9th European Alliance for Innovation International Conference Endorsed Transactions on Security and Safety, Braga, pp. 3-12, 2016.

[20] Khraisat A., Gondal I., Vamplew P., and Kamruzzaman J., “Survey of Intrusion Detection Systems: Techniques, Datasets and Challenges,” Cybersecurity Journal, vol. 2 no. 1, pp. 1-22, 2019. DOI: 10.1186/s42400-019-0038-7

[21] Kotpalliwar M. and Wajgi R., “Classification of Attacks Using Support Vector Machine on KDD Cup 99 IDS Database,” in Proceedings of the 5th International Conference on Communication Systems and Network Technologies, Gwalior, pp. 987-990, 2015.

[22] Krishnan R. and Raajan N., “An Intellectual Intrusion Detection System Model for Attacks Classification Using RNN,” International Journal of Pharmacy and Technology, vol. 8, no. 4, pp. 23157-23164, 2016.

[23] Lornov K., Applying Emerging Data Techniques and Advanced Analytics to Combat Cyber Threat, Master’s Thesis, African University of Science and Technology Abuja, 2017.

[24] Mabayoje M., Abimbola A., Balogun A., and Opeyemi A., “Gain Ratio and Decision Tree Classifier for Intrusion Detection,” International Journal of Computer Applications, vol. 126, no. 1, pp. 56-59, 2015. DOI: 10.5120/ijca2015905983

[25] Marjani M., Fariza N., Gani A., Karim A., Hashem I., and Siddiqa A., “Big IoT Data Analytics: Architecture, Opportunities, and Open Research Challenges,” IEEE Access Journal, vol. 5, pp. 5247-526, 2017. doi:10.1109/ACCESS.2017.2689040

[26] Matin I. and Rahardjo B., “Malware Detection Using Honeypot and Machine Learning,” in Proceedings of the 7th International Conference on Cyber and IT Service Management, Brisbane, pp. 1-4, 2019.

[27] Moore K., Bihl T., Bauer K., and Dube T., “Feature Extraction and Feature Selection for Classifying Cyber Traffic Threats,” Journal of Defense Modeling and Simulation-Applications, Methodology and Technology, vol. 14, no. 3, pp. 217-231, 2017.

[28] Oguntimilehin A. and Ademola O., “A Review of Big Data Management, Benefits and Challenges,” Journal of Emerging Trends in Computing and Information Sciences, vol. 5 no. 6, pp. 433-438, 2014.

[29] Patel A., Alhussian H., Pedersen J., Bounabat B., and Júnior J., “A Nifty Collaborative Intrusion Detection and Prevention Architecture for Smart Grid Ecosystems,” Computers and Security Journal, vol. 64, pp. 92-109, 2017.

[30] Rabia A., Aftab H., Sharma P., and Kumar P., “Machine Learning-Based Soft Computing Regression Analysis Approach for Crime Data Prediction,” Karbala International Journal of Modern Science, vol. 8 no. 1, pp. 1-19, 2022.

[31] Rai K., Devi M., and Guleria A., “Decision Tree Based Algorithm for Intrusion Detection,” International Journal of Advanced Networking and Applications, vol. 7, no. 4, pp. 2828-2834, 2016.

[32] Ranganathan G., “Real Time Anomaly Detection Techniques Using Pyspark Frame Work,” Journal of Artificial Intelligence and Capsule Networks, vol. 2, no. 1, pp. 20-30, 2020. DOI:10.36548/jaicn.2020.1.003

[33] Relang N. and Patil D., “Implementation of Network Intrusion Detection System Using Variant of Decision Tree Algorithm,” in Proceedings of the International Conference on Nascent Technologies in Engineering Navi Mumbai, pp. 1-5, 2015. doi: 10.1109/ICNTE.2015.7029925.

[34] Rizvi S., Labrador G., Guyan M., and Savan J., “Advocating for Hybrid Intrusion Detection Prevention System and Framework Improvement,” Procedia Computer Science, vol. 95, no. 1, pp. 369-374, 2016. 336 The International Arab Journal of Information Technology, Vol. 21, No. 2, March 2024

[35] Sabnani S., Computer Security: A Machine Learning Approach: Master’s Thesis, Department of Mathematics, Royal Holloway University, 2008.

[36] Saranya T., Sridevi S., Deisy C., Chung T., and Khan M., “Performance Analysis of Machine Learning Algorithms in Intrusion Detection System: A Review,” Procedia Computer Science Journal, vol. 171, pp. 1251-1260, 2020.

[37] Sarker I., Kayes A., Badsha S., Alqahtani H., and Watters P., “Cybersecurity Data Science: An Overview from Machine Learning Perspective,” Journal of Big Data, vol. 7 no. 41, pp. 2-29, 2020.

[38] Saxena H. and Richariya V., “Intrusion Detection in KDD99 Dataset Using SVM-PSO and Feature Reduction with Information Gain,” International Journal of Computer Application, vol. 98 no. 6, pp. 25-29, 2014. DOI: 10.5120/17188-7369

[39] Shakil P., and Farid D., “Feature Selection and Intrusion Classification in NSL-KDD Cup 99 Dataset Employing SVMs,” in Proceedings of the 8th International Conference on Software, Knowledge, Information Management and Applications, Dhaka, pp. 1-6, 2014.

[40] Sharifi A., Kasmani S., and Pourebrahimi A., “Intrusion Detection Based on Joint of k-Means and KNN,” Journal of Convergence Information Technology, vol. 10, no. 5, pp. 42-51, 2015.

[41] Siddiqi M., Mugheri A., and Oad K., “Advance Persistent Threat Defense Techniques: A Review,” Pakistan Journal of Computer and Information Systems, vol. 1, no. 2, pp. 53-65, 2016. 789/826

[42] Suhaimi N. and Abas H., “A Systematic Literature Review on Supervised Machine Learning Algorithms,” Perintis eJournal, vol. 10, no. 1, pp. 1-24, 2020.

[43] Tahir R., “Study on Malware and Malware Detection Techniques,” International Journal of Education and Management Engineering, vol. 8, no. 2, pp. 20-30, 2018. DOI:10.5815/ijeme.2018.02.03

[44] Tang T., Mhamdi L., McLernon D., Zaidi S., and Ghogho M., “Deep Learning Approach for Network Intrusion Detection in Software Defined Networking,” in Proceedings of the International Conference on Wireless Networks and Mobile Communications, Fez, pp. 258-263, 2016.

[45] Umara N., Anwar Z., Tehmina A., and Choo K., “A Machine Learning-Based Fintech Cyber Threat Attribution Framework Using High-Level Indicators of Compromise,” Future Generation Computer Systems Journal, vol. 9, no. 6, pp. 227- 242, 2019.

[46] Virvilis-Kollitiris N., Detecting Advanced Persistent Threats through Deception Techniques, Ph.D. Thesis, Athens University of Economics and Business, 2015. -Kollitiris%20Dissertation%20Text.pdf

[47] Vishwakarma S., Sharma V., and Tiwari A., “An Intrusion Detection System Using KNN-ACO Algorithm,” International Journal of Computer Application, vol. 171, no. 10, pp. 18-23, 2017. DOI:10.5120/ijca2017914079

[48] Wang L. and Alexander C., “Big Data in Distributed Analytics, Cybersecurity, Cyberwarfare and Digital Forensics,” Journal of Science and Education Publishing, vol. 1, no. 1, pp. 22-27, 2015. doi: 10.12691/dt-1-1-5

[49] Wang L. and Jones R., “Big Data Analytics for Network Intrusion Detection: A survey,” International Journal of Networks and Communications, vol. 7, no. 1, pp. 24-31, 2017. doi:10.5923/j.ijnc.20170701.03

[50] Wolfgang E., Introduction to Artificial Intelligence, Springer International Publishing, 2017.

[51] Xin Y., Kong L., Liu Z., Chen Y., and Li1 Y., “Machine Learning and Deep Learning Methods for Cybersecurity,” IEEE Access Journal, vol. 6, pp. 35365-35381, 2018. doi: 10.1109/ACCESS.2018.2836950.

[52] Zeeshan A., Khan A., Shiang C., Johari A., and Ahmad F., “Network Intrusion Detection System: A Systematic Study of Machine Learning and Deep Learning Approaches,” Journal of Transactions on Emerging Telecommunications Technologies, vol. 32, no. 1, pp. 1-29, 2020. DOI:10.1002/ett.4150