The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Choosing Decision Tree-Based Boundary Patterns in the Intrusion Detection Systems with Large Data Sets

Today, due to the growing use of computer networks, the issue of security of these networks and the use of intrusion detection systems has received serious attention. A major challenge in intrusion detection systems is the enormous amount of data. The generalization capability of support vector machine has attracted the attention of intrusion detection systems in the last years. The main drawbacks of a support vector machine occur during its training phase, which is computationally expensive and dependent on the size of the input dataset. In this study, a new algorithm to speed up support vector machine training time is presented. In proposed method, First, Ant Colony Optimization (ACO) is used to find prototype samples, then a number of prototype samples is randomly selected and the approximate boundary is determined using support vector machine. Based on the approximate boundary obtained, boundary samples are determined using decision tree. Using these boundary samples, final model is obtained. To demonstrate the effectiveness of the proposed method, standard publicly available datasets have been used. The experiment results show that despite the data reduction, the proposed model produces results with similar accuracy and in a faster way than state-of-the art and the current Support Vector Machine (SVM) implementations.

[1] Aburomman A. and Reaz M., “A Novel SVM- KNN-PSO Ensemble Method for Intrusion Detection System,” Applied Soft Computing, vol. 38, pp. 360-372, 2016.

[2] Anwar I., Salama K., and Abdelbar A., “Instance Selection with Ant Colony Optimization,” Procedia Computer Science, vol. 53, pp. 248-256, 2015.

[3] Cervantes J., Lamont F., López-Chau A., Mazahua L., and Ruíz J., “Data Selection Based on Decision Tree for SVM Classification on Large Data Sets,” Applied Soft Computing, vol. 37, pp. 787-798, 2015.

[4] Chitrakar R. and Huang C., “Selection of Candidate Support Vectors in Incremental SVM for Network Intrusion Detection,” Computers and Security, vol. 45, pp. 231-241, 2014.

[5] Ghaffari H., “Speeding up the Testing and Training Time for the Support Vector Machines with Minimal Effect on The Performance,” The Journal of Supercomputing, vol. 77, no. 2, pp. 11390-11409, 2021.

[6] Ghaffari H. and Yazdi H., “Multiclass Classifier Based on Boundary Complexity,” Neural Computing and Applications, vol. 24, no. 5, pp. 985-93, 2014.

[7] Guo L. and Boukir S., “Fast Data Selection for SVM Training Using Ensemble Margin,” Pattern Recognition Letters, vol. 51, pp. 112-119, 2015.

[8] Ji S., Jeong B., Choi S., and Jeong D., “A Multi- Level Intrusion Detection Method for Abnormal Network Behaviors,” Journal of Network and Computer Applications, vol. 62, pp. 9-17, 2016.

[9] Joldzic O., Djuric Z., and Vuletic P., “A Transparent and Scalable Anomaly-Based Dos Detection Method,” Computer Networks, vol. 104, pp. 27-42, 2016.

[10] Kevric J., Jukic S., and Subasi A., “An Effective Combining Classifier Approach Using Tree Algorithms for Network Intrusion Detection,” Neural Computing and Applications, vol. 28, no. 1, pp. 1051-1058, 2017.

[11] Kumar M. and Gopal M., “A Hybrid SVM based Decision Tree,” Pattern Recognition, vol. 43, no. 12, pp. 3977-87, 2010.

[12] Kyoto University Benchmark Dataset (2009), http://www.takakura.com/Kyoto_data/. 703, Last Visited, 2021.

[13] Liu C., Wang W., Wang M., Lv F., and Konan M., “An Efficient Instance Selection Algorithm to Reconstruct Training Set for Support Vector Machine,” Knowledge-Based Systems, vol. 116, pp. 58-73, 2017.

[14] Li D., Wang Z., Cao C., and Liu Y., “Information Entropy Based Sample Reduction for Support Vector Data Description,” Applied Soft Computing, vol. 71, pp. 1153-60, 2018.

[15] Lin W., Ke S., and Tsai C., “CANN: An Intrusion Detection System Based on Combining Cluster Centers and Nearest Neighbors,” Knowledge-based systems, vol. 78, pp. 13-21, 2015.

[16] Nikolaidis K., Goulermas J., and Wu Q., “A Class Boundary Preserving Algorithm for Data Condensation,” Pattern Recognition, vol. 44, no. 3, pp. 704-715, 2011.

[17] Ougiaroglou S., Diamantaras K., and Evangelidis G., “Exploring the Effect of Data Reduction on Neural Network and Support Vector Machine Classification,” Neurocomputing, vol. 280, pp. 101-110, 2018.

[18] Sharbaf F., Mosafer S., and Moattar M., “A Hybrid Gene Selection Approach for Microarray Data Classification Using Cellular Learning Automata and Ant Colony Optimization,” Genomics, vol. 107, no. 6, pp. 231- 238, 2016.

[19] Sharma A., Manzoor I., Kumar N., “A Feature Reduced Intrusion Detection System Using ANN Classifier,” Expert Systems with Applications, vol. 88, pp. 249-57, 2017.

[20] Shen X., Mu L., Li Z., Wu H., Gou J., and Chen X., “Large-Scale Support Vector Machine Classification with Redundant Data Reduction,” Neurocomputing, vol. 172, pp. 189-97, 2016.

[21] Singh R., Kumar H., and Singla R., “An Intrusion Detection System Using Network Traffic Profiling and Online Sequential Extreme Learning Machine,” Expert Systems with Applications, vol. 42, no. 22, pp. 8609-24, 2015.

[22] Yang L., Zhu Q., Huang J., and Cheng D., “Adaptive Edited Natural Neighbor Algorithm,” Neurocomputing, vol. 230, pp. 427-33, 2017.

[23] Yang L., Zhu Q., Huang J., Wu Q., and Cheng D., Hong X., “Constraint Nearest Neighbor for Instance Reduction,” Soft Computing, vol. 23, no. 11, pp. 13235-45, 2019.

[24] Yu H., Yang J., and Han J., “Classifying Large Data Sets Using Svms with Hierarchical Clusters,” in Proceedings of the 9th ACM SIGKDD International Conference on Choosing Decision Tree-Based Boundary Patterns in The Intrusion Detection ... 369 Knowledge Discovery and Data Mining, Washington, pp. 306-315, 2003.

[25] Tabash M., Abd Allah M., Tawfik B., “Intrusion Detection Model Using Naive Bayes and Deep Learning Technique,” The International Arab Journal of Information Technology, vol. 17, no. 2, pp. 215-24, 2020.

[26] Tang T., Chen S., Zhao M., Huang W., and Luo J., “Very Large-Scale Data Classification Based on K-Means Clustering and Multi-Kernel SVM,” Soft Computing, vol. 23, no. 1, pp. 3793-3801, 2019.