
NS-PPO: A Two-Stage Data Resampling Framework for the Initial Phase of Software Defect Prediction
Software Defect Prediction (SDP) is one of the most reliability assurance methods before the delivery of software projects. However, class imbalance is a common issue in software projects, significantly hindering the ability of SDP methods to distinguish between defective and non-defective instances. Recently, although several SDP imbalance-handling methods have achieved certain success, they still exhibit limitations in terms of reliability and applicability. To address this, this paper proposes Neighborhood cleaning rule and Synthetic minority oversampling technique with Proximal Policy Optimization-based adaptive sampling (NS-PPO), a two stage-based data resampling framework aimed at mitigating the impact of class imbalance in software projects. NS-PPO operates in two phases. In the first phase, a hybrid sampler that combines Neighborhood CLeaning rule (NCL) and Synthetic Minority Oversampling TEchnique (SMOTE) is employed to generate a large number of synthetic samples for minority instances. In the second phase, a Deep Reinforcement Learning (DRL)-based undersampler is designed to filter high- quality synthetic samples. These selected samples are then combined with real samples to form the training set for the SDP methods. Extensive experiments are conducted on 18 software projects from the PRedictOr Models In Software Engineering (PROMISE) and National Aeronautics and Space Administration (NASA) datasets, with Matthews Correlation Coefficient (MCC), Area Under the Curve (AUC), and F-measure used as evaluation metrics. The findings demonstrate that, regardless of whether expert metrics or semantic metrics are used as inputs for SDP methods, NS-PPO exhibits significant advantages over the state-of-the-art SDP imbalance-handling methods, including Learning-To-Rank UnderSampling (LTRUS).
[1] Agrawal A. and Menzies T., ““Better Data” is Better than “Better Data Miners” (Benefits of Tuning SMOTE for Defect Prediction),” in Proceedings of the 40th International Conference on Software Engineering, Gothenburg, pp. 1050- 1061, 2018. https://doi.org/10.1145/3180155.3180197
[2] Bahaweres R., Agustian F., Hermadi I., Suroso A., and Arkeman Y., “Software Defect Prediction Using Neural Network Based SMOTE,” in Proceedings of the 7th International Conference on Electrical Engineering, Computer Sciences and Informatics, Yogyakarta, pp. 71-76, 2020. DOI:10.23919/EECSI50503.2020.9251874
[3] Batista G., Prati R., and Monard M., “A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 20-29, 2004. https://doi.org/10.1145/1007730.1007735
[4] Bennin K., Keung J., and Monden A., “On the Relative Value of Data Resampling Approaches for Software Defect Prediction,” Empirical Software Engineering, vol. 24, pp. 602-636, 2019. https://doi.org/10.1007/s10664-018-9633-6
[5] Breiman L., “Random Forests,” Machine Learning, vol. 45, pp. 5-32, 2001. https://doi.org/10.1023/A:1010933404324
[6] Chen L., Fang B., Shang Z., and Tang Y., “Tackling Class Overlap and Imbalance Problems in Software Defect Prediction,” Software Quality Journal, vol. 26, pp. 97-125, 2018. https://doi.org/10.1007/s11219-016-9342-6
[7] Cover T. and Hart P., “Nearest Neighbor Pattern Classification,” IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21-27, 1967. DOI:10.1109/TIT.1967.1053964
[8] Czibula G., Marian Z., and Czibula I., “Software Defect Prediction Using Relational Association Rule Mining,” Information Sciences, vol. 264, pp. 260-278, 2014. https://doi.org/10.1016/j.ins.2013.12.031
[9] Ding H., Chen L., Dong L., Fu Z., and Cui X., “Imbalanced Data Classification: A KNN and Generative Adversarial Networks-Based Hybrid Approach for Intrusion Detection,” Future Generation Computer Systems, vol. 131, pp. 240- 254, 2022. https://doi.org/10.1016/j.future.2022.01.026
[10] Dipa W. and Sunindyo W., “Software Defect Prediction Using SMOTE and Artificial Neural Network,” in Proceedings of the International Conference on Data and Software Engineering, Bandung, pp. 1-4, 2021. DOI:10.1109/ICoDSE53690.2021.9648476
[11] Feng S., Keung J., Xiao Y., Zhang P., Yu X., and Cao X., “Improving the Undersampling Technique by Optimizing the Termination Condition for Software Defect Prediction,” Expert Systems with Applications, vol. 235, pp. 121084, 2024. https://doi.org/10.1016/j.eswa.2023.121084
[12] Feng S., Keung J., Yu X., Xiao Y., Bennin K., Kabir M., and Zhang M., “COSTE: Complexity-Based Oversampling Technique to Alleviate the Class Imbalance Problem in Software Defect Prediction,” Information and Software Technology, vol. 129, pp. 106432, 2021. https://doi.org/10.1016/j.infsof.2020.106432
[13] Gong L., Jiang S., Wang R., and Jiang L., “Empirical Evaluation of the Impact of Class Overlap on Software Defect Prediction,” in Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, San Diego, pp. 698-709, 2019. DOI:10.1109/ASE.2019.00071
[14] Goyal S., “Handling Class-Imbalance with KNN (Neighbourhood) Under-Sampling for Software Defect Prediction,” Artificial Intelligence Review, vol. 55, pp. 2023-2064, 2022. https://doi.org/10.1007/s10462-021-10044-w
[15] Gupta M., Rajnish K., and Bhattacharjee V., “Software Fault Prediction with Imbalanced Datasets Using SMOTE-Tomek Sampling Technique and Genetic Algorithm Models,” Multimedia Tools and Applications Journal, vol. 83, pp. 47627-47648, 2024. https://doi.org/10.1007/s11042-023-16788-7
[16] Han H., Wang W., and Mao B., “Borderline- SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning,” in Proceedings of the Advances in Intelligent Computing Conference, Hefei, pp. 878-887, 2005. https://doi.org/10.1007/11538059_91
[17] Jiang S., Chen Y., He Z., Shang Y., and Ma L., “Cross-Project Defect Prediction Via Semantic and Syntactic Encoding,” Empirical Software Engineering, vol. 29, no. 4, pp. 80, 2024. https://doi.org/10.1007/s10664-024-10495-z
[18] Jureczko M. and Madeyski L., “Towards Identifying Software Project Clusters with Regard to Defect Prediction,” in Proceedings of the 6th International Conference on Predictive Models in Software Engineering, Timisoara, pp. 1-10, 2010. https://doi.org/10.1145/1868328.1868342
[19] Khleel N. and Nehez K., “A Novel Approach for Software Defect Prediction Using CNN and GRU Based on SMOTE Tomek Method,” Journal of Intelligent Information Systems, vol. 60, no. 3, pp. 673-707, 2023. https://doi.org/10.1007/s10844- 023-00793-1
[20] Khleel N. and Nehez K., “Software Defect Prediction Using a Bidirectional LSTM Network 886 The International Arab Journal of Information Technology, Vol. 22, No. 5, September 2025 Combined with Oversampling Techniques,” Cluster Computing, vol. 27, pp. 3615-3638, 2024. https://doi.org/10.1007/s10586-023-04170-z
[21] Khoshgoftaar T. and Gao K., “Feature Selection with Imbalanced Data for Software Defect Prediction,” in Proceedings of the International Conference on Machine Learning and Applications, Miami, pp. 235-240, 2009. DOI:10.1109/ICMLA.2009.18
[22] Kim D. and Chung Y., “Addressing Class Imbalances in Software Defect Detection,” Journal of Computer Information Systems, vol. 64, no. 2, pp. 219-231, 2024. https://doi.org/10.1080/08874417.2023.2187483
[23] Laurikkala J., “Improving Identification of Difficult Small Classes by Balancing Class Distribution,” in Proceedings of the 8th Conference on Artificial Intelligence in Medicine in Europe, Cascais, pp. 63-66, 2001. https://doi.org/10.1007/3-540-48229-6_9
[24] Li J., He P., Zhu J., and Lyu M., “Software Defect Prediction via Convolutional Neural Network,” in Proceedings of the IEEE International Conference on Software Quality, Reliability and Security, Prague, pp. 318-328, 2017. DOI:10.1109/QRS.2017.42
[25] Liu J., Ai J., Lu M., Wang J., and Shi H., “Semantic Feature Learning for Software Defect Prediction from Source Code and External Knowledge,” Journal of Systems and Software, vol. 204, pp. 111753, 2023. https://doi.org/10.1016/j.jss.2023.111753
[26] Liu Y., Sun F., Yang J., and Zhou D., “Software Defect Prediction Model Based on Improved BP Neural Network,” in Proceedings of the 6th International Conference on Dependable Systems and their Applications, Harbin, pp. 521-522, 2020. DOI:10.1109/DSA.2019.00095
[27] Ma Y., Luo G., Zeng X., and Chen A., “Transfer Learning for Cross-Company Software Defect Prediction,” Information and Software Technology, vol. 54, no. 3, pp. 248-256, 2012. https://doi.org/10.1016/j.infsof.2011.09.007
[28] Nam J. and Kim S., “Heterogeneous Defect Prediction,” in Proceedings of the 10th Joint Meeting on Foundations of Software Engineering, Bergamo, pp. 508-519, 2015. https://doi.org/10.1145/2786805.2786814
[29] Nazarudin N., Ariffin N., and Maskat R., “Leveraging on Synthetic Data Generation Techniques to Train Machine Learning Models for Tenaga Nasional Berhad Stock Price Movement Prediction,” The International Arab Journal of Information Technology, vol. 21, no. 3, pp. 483-494, 2024. https://doi.org/10.34028/iajit/21/3/11
[30] Okutan A. and Yildiz O., “Software Defect Prediction Using Bayesian Networks,” Empirical Software Engineering, vol. 19, pp. 154-181, 2014. https://doi.org/10.1007/s10664-012-9218-8
[31] Pan C., Lu M., Xu B., and Gao H., “An Improved CNN Model for Within-Project Software Defect Prediction,” Applied Sciences, vol. 9, no. 10, pp. 1- 28, 2019. https://doi.org/10.3390/app9102138
[32] Pedregosa F., Varoquaux G., Gramfort A., Michel V., and et al., “Scikit-Learn: Machine Learning in Python,” The Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011. https://dl.acm.org/doi/10.5555/1953048.2078195
[33] Quinlan J., “Induction of Decision Trees,” Machine Learning, vol. 1, pp. 81-106, 1986. https://doi.org/10.1007/BF00116251
[34] Schulman J., Wolski F., Dhariwal P., Radford A., and Klimov O., “Proximal Policy Optimization Algorithms,” arXiv Preprint, vol. arXiv:1707.06347, pp. 1-12, 2017. https://doi.org/10.48550/arXiv.1707.06347
[35] Tantithamthavorn C., Mclntosh S., Hassan A., and Matsumoto K., “An Empirical Comparison of Model Validation Techniques for Defect Prediction Models,” IEEE Transactions on Software Engineering, vol. 43, no. 1, pp. 1-18, 2017. DOI:10.1109/TSE.2016.2584050
[36] Wang X., Lu L., Tian Q., and Lin H., “IC-GraF: An Improved Clustering with Graph-Embedding- Based Features for Software Defect Prediction,” IET Software, vol. 2024, pp. 1-22, 2024. https://doi.org/10.1049/2024/8027037
[37] Yan K., Lu C., Ma X., Ji Z., and Huang J., “Intelligent Fault Diagnosis for Air Handing Units Based on Improved Generative Adversarial Network and Deep Reinforcement Learning,” Expert Systems with Applications, vol. 240, pp. 122545, 2024. https://doi.org/10.1016/j.eswa.2023.122545
[38] Yang F., Zhong F., Zeng G., Xiao P., and Zheng W., “LineFlowDP: A Deep Learning-Based Two-Phase Approach for Line-Level Defect Prediction,” Empirical Software Engineering, vol. 29, no. 2, pp. 50, 2024. https://doi.org/10.1007/s10664-023- 10439-z
[39] Yang P., Zhu L., Zhang Y., Ma C., Liu L., Yu X., and Hu W., “On the Relative Value of Clustering Techniques for Unsupervised Effort-Aware Defect Prediction,” Expert Systems with Applications, vol. 245, pp. 123041, 2024. https://doi.org/10.1016/j.eswa.2023.123041
[40] Yang X., Wang S., Li Y., and Wang S., “Does Data Sampling Improve Deep Learning-Based Vulnerability Detection? Yeas! and Nays!,” in Proceedings of the IEEE/ACM 45th International Conference on Software Engineering, Melbourne, pp. 2287-2298, 2023. DOI:10.1109/ICSE48619.2023.00192
[41] Yu X., Liu L., Zhu L., Keung J., Wang Z., and Li F., “A Multi-Objective Effort-Aware Defect Prediction Approach Based on NSGA-II,” Applied NS-PPO: A Two-Stage Data Resampling Framework for The Initial Phase of Software ... 887 Soft Computing, vol. 149, pp. 110941, 2023. https://doi.org/10.1016/j.asoc.2023.110941
[42] Zhou Y., Lu L., Zou Q., and Li C., “Two-Stage AST Encoding for Software Defect Prediction,” in Proceedings of the 34th International Conference on Software Engineering and Knowledge Engineering, Pittsburgh, pp. 1-4, 2022. DOI:10.18293/SEKE2022-039