The International Arab Journal of Information Technology (IAJIT)


Default Prediction Model: The Significant Role of

For financial institutions and the banking industry, it is very crucial to have predictive models for their core financial activities, and especially those activities which play major roles in risk management. Predicting loan default is one of the critical issues that banks and financial institutions focus on, as huge revenue loss could be prevented by predicting customer’s ability not only to pay back, but also to be able to do that on time. Customer loan default prediction is a task of proactively identifying customers who are most probably to stop paying back their loans. This is usually done by dynamically analyzing customers’ relevant information and behaviors. This is significant so as the bank or the financial institution can estimate the borrowers’ risk. Many different machine learning classification models and algorithms have been used to predict FXVWRPHUV¶ DELOLW\ WR SD\ EDFN ORDQV ,Q WKLV SDSHU WKUHH GLIIHUHQW FODVVLILFDWLRQ PHWKRGV 1DwYH %D\HV 'HFLVLRQ Tree, and Random Forest) are used for prediction, comprehensive different pre-processing techniques are being applied on the dataset in order to gain better data through fixing some of the main data issues like missing values and imbalanced data, and three different feature extractions algorithms are used to enhance the accuracy and the performance. Results of the competing models were varied after applying data preprocessing techniques and features selections. The results were compared using F1 accuracy measure. The best model achieved an improvement of about 40%, whilst the least performing model achieved an improvement of 3% only. This implies the significance and importance of data engineering (e.g., data preprocessing techniques and features selections) course of action in machine learning exercises.

[1] Al-qerem A., Al-Naymat G., and Alhasan M., “Loan Default Prediction Model Improvement through Comprehensive Preprocessing and Features Selection,” in Proceedings of International Arab Conference on Information Technology, Al Ain, pp. 235-240, 2019.

[2] Akrani G., Kaylan City Life (20-Apr-2011), Available: http://kalyan- important-banking.html, Last Visited, 2019.

[3] Angelini E., Tollo G., and Roli A., “A Neural Network Approach for Credit Risk Evaluation,” The Quarterly Review of Economics and Default Prediction Model: The Significant Role of Data Engineering in the Quality of Outcomes 643 Finance, vol. 48, no. 4,pp. 733-755, 2008.

[4] Bentlemsan M., Zemouri E., Bouchaffra D., Yahya-Zoubir B., and Ferroudji K., “Random Forest and Filter Bank Common Spatial Patterns for EEG-Based Motor Imagery Classification,” in Proceedings of International Conference on Intelligent Systems, Modelling and Simulation, Langkawi, pp. 235- 238, 2014.

[5] Chen Y. Zhang J., and Ng W., “Loan Default Prediction Using Diversified Sensitivity Undersampling,” in Proceedings of International Conference on Machine Learning and Cybernetics, Chengdu, pp. 1020-1025, 2018.

[6] Chioka (2013, Aug, 30), Available: Last Visited, 2019.

[7] Deng T., “Study of the Prediction of Micro-Loan Default Based on Logit Model,” in Proceedings of International Conference on Economic Management and Model Engineering, Malacca, pp. 260-264, 2019.

[8] Eulogio R, ORACLE + Data Science (2017, Aug, 12), Available: s/random-forest-intro, Last Visited, 2019.

[9] Gahlaut A., Tushar K., and Singh P., “Prediction Analysis of Risky Credit Using Data Mining Classification Models,” in Proceedings of 28th International Conference on Computing, Communication and Networking Technologies, Delhi, pp. 1-7, 2017.

[10] Hassan A. and Abraham A., “Modeling Consumer Loan Default Prediction Using Ensemble Neural Networks,” in Proceedings of International Coference on Computing, Electrical and Electronic Engineering, Khartoum, pp. 719-724, 2013.

[11] Hsu C. and Hung F., “Classification Methods of Credit Rating - A Comparative Analysis on SVM, MDA and RST,” in Proceedings of International Conference on Computational Intelligence and Software Engineering, Wuhan, pp. 1-4, 2009.

[12] Jin Y. and Zhu Y., “A Data-Driven Approach to Predict Default Risk of Loan for Online Peer-to- Peer (P2P) Lending,” in Proceedings of 5th International Conference on Communication Systems and Network Technologies, Gwalior, pp. 609-613, 2015.

[13] Kim H. Park C.,Yang H., and Sim K., “Genetic Algorithm Based Feature Selection Method Development for Pattern Recognition,” in Proceedings of SICE-ICASE International Joint Conference, Busan, pp. 1020-1025, 2006.

[14] Mahanipour A. and Nezamabadi-pour H., “Improved PSO-based feature construction algorithm using Feature Selection Methods,” in Proceedings of 2nd Conference on Swarm Intelligence and Evolutionary Computation, Kerman, pp. 1-5, 2017.

[15] Netti K.and Radhika Y., “A Novel Method for Minimizing Loss of Accuracy In Naive Bayes Classifier,” in Proceedings of IEEE International Conference on Computational Intelligence and Computing Research, Madurai, pp. 1-4, 2015.

[16] Reddy M. and Kavitha B., “Neural Networks for Prediction of Loan Default Using Attribute Relevance Analysis,” in Proceedings of International Conference on Signal Acquisition and Processing, Bangalore, pp. 274-277, 2010.

[17] Shoumo S., Dhruba M., Hossain S., Ghani N., Arif H., and Islam H., “Application of Machine Learning in Credit Risk Assessment: A Prelude to Smart Banking,” in Proceedings of TENCON IEEE Region 10 Conference, Kochi, pp. 2023- 2028, 2019.

[18] Xiaoliang Z., Hongcan Y., Jian W., and Shangzhuo W., “Research and Application ofthe improved Algorithm C4.5 on Decision Tree,” in Proceedings of International Conference on Test and Measurement, Hong Kong, pp. 184-187, 2009.

[19] Xiang-wei L. and Yian-fang Q., “A Data Preprocessing Algorithm for Classification Model Based On Rough Sets,” in Proceedings of International Conference on Solid State Devices and Materials Science, pp. 2025-2029, 2012.

[20] Zhang H., Ren Y., and Yang X., “Research on Text Feature Selection Algorithm Based on Information Gain and Feature Relation Tree,” in Proceedings of 10th Web Information System and Application Conference, Yangzhou, pp. 446- 449, 2013. 644 The International Arab Journal of Information Technology, Vol. 17, No. 4A, Special Issue 2020 Ahmad Al-Qerem graduated in applied mathematics and MSc in Computer Science at the Jordan University of Science and Technology and Jordan University in 1997 and 2002, respectively. After that, he was appointed as full-time lecturer at the Zarqa University. He was a visiting professor at Princess Sumaya University for Technology (PSUT). He obtained a PhD from Loughborough University, UK. His research interests are in performance and analytical modeling, mobile computing environments, protocol engineering, communication networks, transition to IPv6, machine learning and transaction processing. He has published several papers in various areas of computer science. Currently, he has a full academic post as a full professor at computer science department at Zarqa University-Jordan. Ghazi Al-Naymat received my Ph.D. degree in May 2009 from the School of Information Technologies at The University of Sydney, Australia. In 2015, I joined the Department of Computer Science, King Hussein School of Computing Sciences at Princess Sumaya University for Technology (PSUT). In addition, I worked as the chair of the computer science department at PSUT from 2017-2019. My research interests include: Data Mining and machine learning, big data, and data science. He has a full academic post as associate professor at Department of Information Technology, Ajman University, Ajman, United Arab Emirates. Mays Al Hasan obtained her bachelor degree in Computer Engineering from Jordan University of Science and Technology, and she is now studying for a Data Science master’s degree at Princess Sumaya University for Technology. She has published couple of papers in different areas of Data Science and Analytics. She is currently working as Technical Product Manager for AI and Analytic at Mawdoo3, and has over 10 years of experience working domestically and internationally in the Analytics and Business Intelligent fields in different industries. Mutaz Al-Debei is currently working as a Senior Territory Manager for the Public Sector at Oracle. Previously, he was a Senior Territory Manager for Autonomous Data Management & Cloud Technology at Oracle. Also at Oracle, Al-Debei had a previous role as a Principal Cloud Platform Consultant - Big Data & Business Analytics. Before Joining Oracle, Al-Debei was working as the Director of Big Data & Advanced Analytics at INTRASOFT MEA. Moreover and before joining INTRASOFT, Al-Debei was serving as an Associate Professor of Information Systems and Computing at the University of Jordan (UJ), and also as an ICT Chief Consultant at the National Center for Security and Crises Management. He also worked as an IT Manager for Arab Radio & Television (ART) in Jordan Media City and he held other positions in Al- Ahli Bank (Master Card Department) and Royal Scientific Society. Al-Debei earned his PhD from Brunel University London (BUL) in Information Systems and Computing in May, 2010. Furthermore, Al-Debei has received many international and national significant research awards such as Abdul Hameed Shoman Award for Arab Researchers – ICTs, 2015, the prestigious Vice Chancellor's Prize for Doctoral Research from Brunel University London in 2010, the Distinguished Researcher Award from The University of Jordan - three times in 2012, 2014, and 2018. Also, he received best paper awards from UKAIS (2008), and another one from IFIP 8.2 (2010).