The International Arab Journal of Information Technology (IAJIT)


An Efficient Intrusion Detection Framework Based on Embedding Feature Selection and Ensemble Learning Technique

Network security has emerged as a crucial universal issue that affects enterprises, governments, and individuals. The strategies utilized by the attackers are continuing to evolve, and therefore the rate of attacks targeting the network system has expanded dramatically. An Intrusion Detection System (IDS) is one of the significant defense solutions against sophisticated cyberattacks. However, the challenge of improving the accuracy, detection rate, and minimal false alarms of the IDS continues. This paper proposes a robust and effective intrusion detection framework based on the ensemble learning technique using eXtreme Gradient Boosting (XGBoost) and an embedded feature selection method. Further, the best uniform feature subset is extracted using the up-to-date real-world intrusion dataset Canadian Institute for Cybersecurity Intrusion Detection (CICIDS2017) for all attacks. The proposed IDS framework has successfully exceeded several evaluations on a big test dataset over both multi and binary classification. The achieved results are promising on various measurements with an accuracy overall, precision, detection rate, specificity, F-score, false-negative rate, false-positive rate, error rate, and The Area Under the Curve (AUC) scores of 99.86%, 99.69%, 99.75%, 99.69%, 99.72%, 0.17%, 0.2%, 0.14%, and 99.72 respectively for abnormal class. Moreover, the achieved results of multi-classification are also remarkable and impressively great on all performance metrics.

[1] Abdulhammed R., Musafer H., Alessa A., Faezipour M., and Abuzneid A., “Features Dimensionality Reduction Approaches for Machine Learning Based Network Intrusion Detection,” Electronics, vol. 8, no. 3, pp. 332, 2019.

[2] Aksu D., Üstebay S., Aydin M., and Atmaca T., “Intrusion Detection With Comparative Analysis of Supervised Learning Techniques and Fisher Score Feature Selection Algorithm,” in Proceedings of International Symposium on Computer and Information Sciences, pp. 141- 149, 2018.

[3] Chen T. and Guestrin C., “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, pp. 785-794, 2016.

[4] CIC, “Intrusion Detection Evaluation Dataset (CICIDS2017),” Canadian Institute for Cybersecurity,, Last Visited, 2019.

[5] Drummond C. and Holte R., “C4.5, Class Imbalance, and Cost Sensitivity: Why Under- Sampling Beats Over-Sampling,” Workshop on Learning from Imbalanced Datasets II, ICML, Washington DC, pp. 1-8, 2003.

[6] Friedman J., “Greedy Function Approximation: A Gradient Boosting Machine,” The Annals of Statistics, vol. 29, no. 5, pp. 1189-1232, 2001.

[7] Galar M., Fernandez A., Barrenechea E., Bustince H., and Herrera F., “A Review on Ensembles for The Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 42, no. 4, pp. 463-484, 2012.

[8] Gareth J., Daniela W., Trevor H., and Robert T., An Introduction to Statistical Learning with Applications in R, Springer, 2013.

[9] Ghanem W. and Jantan A., “Novel Multi- Objective Artificial Bee Colony Optimization for Wrapper Based Feature Selection in Intruction Detectoin,” International journal of advance soft computing applications, vol. 8, no. 1, pp. 70-81, 2016.

[10] Ivanciuc O., “Weka Machine Learning for Predicting the Phospholipidosis Inducing Potential,” Current Topics in Medicinal Chemistry, vol. 8, no. 18, pp. 1691-1709, 2008.

[11] Jayakumar K., Revathi T., and Karpagam S., “Intrusion Detection Using Artificial Neural Networks with Best Set of Features,” The International Arab Journal of Information Technology, vol. 12, no. 6A, pp. 728-734, 2015.

[12] Jiang J., Yu Q., Yu M., Li G., Chen J., Liu K., and Huang W., “ALDD: A Hybrid Traffic-User Behavior Detection Method for Application An Efficient Intrusion Detection Framework Based on Embedding Feature Selection and Ensemble… 247 Layer DDoS,” in Proceedings of 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering, New York, pp. 1565- 1569, 2018.

[13] Liao H., Lin C., Lin Y., and Tung K., “Intrusion Detection System: A Comprehensive Review,” Journal of Network and Computer Applications, vol. 36, no. 1, pp. 16-24, 2013.

[14] Luo B. and Xia J., “A Novel Intrusion Detection System Based on Feature Generation with Visualization Strategy,” Expert Systems with Applications, vol. 41, no. 9, pp. 4139-4147, 2014.

[15] Marir N., Wang H., Feng G., Li B., and Jia M., “Distributed Abnormal Behavior Detection Approach Based on Deep Belief Network and Ensemble SVM Using Spark,” IEEE Access, vol. 6, pp. 59657-59671, 2018.

[16] Mease D., Wyner A., and Buja A., “Boosted Classification Trees and Class Probability/Quantile Estimation,” Journal of Machine Learning Research, vol. 8, pp. 409-439, 2007.

[17] Mishra P., Varadharajan V., Tupakula U., and Pilli E., “A Detailed Investigation and Analysis of Using Machine Learning Techniques for Intrusion Detection,” IEEE Communications Surveys and Tutorials, vol. 21, no. 1, pp. 686-728, 2019.

[18] Moayedikia A., Ong K., Boo Y., Yeoh W., and Jensen R., “Feature Selection for High Dimensional Imbalanced Class Data Using Harmony Search,” Engineering Applications of Artificial Intelligence, vol. 57, pp. 38-49, 2017.

[19] Moustafa N. and Slay J., “The Evaluation of Network Anomaly Detection Systems: Statistical Analysis of The UNSW-NB15 Data Set and The Comparison with The KDD99 Data Set,” Information Security Journal: A Global Perspective, vol. 25, no. 1-3, pp. 18-31, 2016.

[20] Nielsen D., Tree Boosting With XGBoost, Ntnu, 2016.

[21] Raman M., Somu N., Kirthivasan K., Liscano R., and Sriram V., “An Efficient Intrusion Detection System Based on Hypergraph-Genetic Algorithm for Parameter Optimization and Feature Selection in Support Vector Machine,” Knowledge-Based Systems, vol. 134, pp. 1-12, 2017.

[22] Sharafaldin I., Habibi Lashkari A., and Ghorbani A., “Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization,” in Proceedings of the 4th International Conference on Information Systems Security and Privacy, Funchal, pp. 108-116, 2018.

[23] Singh R., Kumar H., and Singla R., “An Intrusion Detection System Using Network Traffic Profiling and Online Sequential Extreme Learning Machine,” Expert Systems with Applications, vol. 42, no. 22, pp. 8609-8624, 2015.

[24] Tabash M., Allah M., and Tawfik B., “Intrusion Detection Model Using Naive Bayes and Deep Learning Technique,” The International Arab Journal of Information Technology, vol. 17, no. 2, pp. 215- 224, 2020.

[25] Tjhai G., Furnell S., Papadaki M., and Clarke N., “A Preliminary Two-Stage Alarm Correlation and Filtering System Using SOM Neural Network And K-Means Algorithm,” Computers and Security, vol. 29, no. 6, pp. 712-723, 2010.

[26] Ustebay S., Turgut Z., and Aydin M., “Intrusion Detection System with Recursive Feature Elimination by Using Random Forest and Deep Learning Classifier,” in Proceedings of International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism, Ankara pp. 71-76, 2019.

[27] Vijayanan R., Devaraj D., and Kannapiran B., “Intrusion Detection System for Wireless Mesh Network Using Multiple Support Vector Machine Classifiers with Genetic-Algorithm- Based Feature Selection,” Computers and Security, vol. 77, pp. 304-314, 2018.

[28] Wang H., Gu J., and Wang S., “An Effective Intrusion Detection Framework Based on SVM with Feature Augmentation,” Knowledge-Based Systems, vol. 136, pp. 130-139, 2017.

[29] WhiteHat, “2018 Application Security Statistics Report,” 2018.

[30] ZorarpacI E. and Özel S., “A Hybrid Approach of Differential Evolution and Artificial Bee Colony for Feature Selection,” Expert Systems with Applications, vol. 62, pp. 91-103, 2016.