The International Arab Journal of Information Technology (IAJIT)


Analyzing the Behavior of Multiple Dimensionality Reduction Algorithms to Obtain Better Accuracy using Benchmark KDD CUP Dataset

In the ubiquitously connected world of IT infrastructure, Intrusion Detection System (IDS) plays vital role. IDS is considered as a critical component of security infrastructure and is implemented either through hardware or software devices and can detect malicious activities in a networked environment. To detect or prevent network attacks, Network Intrusion Detection (NID) system may be equipped with machine learning algorithms to achieve better accuracy and faster detection speed. Analyzing different attacks effectively through Dimensionality Reduction Algorithms is an efficient mechanism. The significance of these algorithms is they improvise feature selection from huge datasets. Also through this the learning speed is enhanced. Speed is a crucial parameter in the success of network intrusion detection systems for defending reactions. In this paper open source datasets Knowledge Discovery in Databases (KDD CUP) dataset and 10% KDD CUP dataset are employed for experimentation. These datasets are provided to Dimensionality Reduction Algorithms like Principal Component Analysis (PCA), Linear Discriminate Analysis (LDA) and Kernel PCA with different kernels and classified with Logistic Regression classification algorithm for procuring accurate results. Further to boost up the accuracy achieved so far K-fold algorithm is utilized. Finally a comparative study of different accuracy results is done by using K-fold algorithm and also without the usage of this algorithm. The empirical study on KDD CUP data confirms the effectiveness of the proposed scheme. In this paper we discovered the combination of multiple dimensionality reduction algorithm such as PCA , LDA and Kernel PCA with classification algorithm and this combination of algorithm gives best result. Our study will help out the researchers to uncover critical area such as intrusion detection in network traffic environment. The results what we identified will be very much helpful for researchers for their future research on KDD CUP dataset. In this the new theory will be arrived by this research that the best accuracy achieved by PCA with 10% KDD CUP dataset experimental results without KFold attained 98% and with KFold attained 99%. LDA with 10% KDD CUP Dataset experimental results without KFold attained 98% and with KFold attained 99%.

  1. Albert R., “Network Inference, Analysis, and Modeling in Systems Biology,” The Plant Cell, vol. 19, no. 11, pp-3327- 3338, 2007.
  2. Aminanto M., Choi R., Tanuwidjaja H., Yoo P D., and Kim K., “Deep Abstraction and Weighted Feature Selection for Wi-Fi Impersonation Detection,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 3, pp. 621-636, 2018.
  3. Biesiada J. and Duch W., “Feature Selection for High Dimensional Data, A Pearson Redundancy Based Filter,” in Proceedings of Computer Recognition Systems, Springer, pp. 242-249, 2007.
  4. Brankovic A., Hosseini M., and Piroddi L., “A Distributed Feature Selection Algorithm Based on Distance Correlation with an Application to Microarrays,” IEEE Transactions, pp. 1-1, 2018.
  5. Breiman L. and Friedman J., “Estimating Optimal Transformations for Multiple Regression and Correlation,” Journal of the American Statistical Association, vol. 80, no. 391, pp. 580-598,1985.
  6. Chen R., Sun N., Chen X., Yang M., and Wu Q., “Supervised Feature Selection with a Stratified Feature Weighting Method,” IEEE Access, vol. 6, pp. 15087-15098, 2018.
  7. Devarajan R. and Rao P., “An Efficient Intrusion Detection System by Using Behaviour Profiling and Statistical Approach Model,” The International Arab Journal of Information Technology, vol. 18, no. 1, pp. 114-124, 2021.
  8. Haroun S., Seghir A., and Touati S., “Multiple Features Extraction and Selection for Detection and Classification of Stator Winding Faults,” IET Electric Power Applications, vol. 12, no. 3, pp. 339-346, 2017.
  9. Heady., Luger G., Maccabe A., and Servilla M., “The Architecture of A Network Level Intrusion Detection System,” Technical Report, University of New Mexico,1990.
  10. Hnaif A., Jaber K., Alia M., and Daghbosheh M., “Parallel Scalable Approximate Matching Algorithm for Network Intrusion Detection Systems,” The International Arab Journal of Information Technology, vol. 18, no. 1, pp. 77-84, 2021.
  11. Lee J., Wangduk S., and Kim D., “Efficient Information-Theoretic Unsupervised Feature Selection,” Electronics Letters, vol. 54, no. 2, pp.76-77, 2018.
  12. Liu Z., Huang J., Wang Y., and Cao D., “ECoFFeS: A Software Using Evolutionary Computation for Feature Selection in Drug Discovery,” IEEE Access, vol. 6, pp. 20950- 20963, 2018.
  13. Paxson V., “A System for Detecting Network Intruders in Real-Time,” in Proceedings of the 7th USENIX Security Symposium, San Antonio, pp. 31-52, 1998.
  14. Roesch M., “Snort-lightweight Intrusion Detection for Networks,” in Proceedings of the 13th USENIX Conference on System Administration, Seattle, pp. 229-238,1999.
  15. Toloueiashtian M., Golsorkhtabaramiri M., and Rad S., “Solving Point Coverage Problem in Wireless Sensor Networks Using Whale Optimization Algorithm,” The International Arab Journal of Information Technology, vol. 18, no. 6, pp. 830-88, 2021.