The International Arab Journal of Information Technology (IAJIT)


Survival Prediction of Children after Bone Marrow Transplant Using Machine Learning Algorithms

Bone marrow is the source of many blood-related diseases, such as blood cancers, and Bone Marrow Transplantation (BMT), also known as Hematopoietic Stem Cell Transplantation (HSCT), is a life-saving surgical procedure. However, this treatment is associated with a high risk of mortality. Predicting survival after BMT is therefore essential for effective and accurate treatment. BMT is considered a treatment-related mortality due to several primary causes of death such as infections, toxicity, and Graft-versus-Host Disease (GvHD) that occur after treatment. In addition, several risk factors affect the success of BMT and long-term survival after treatment. Therefore, there is a need for a prediction system based on machine learning techniques that can predict whether the patient will survive after BMT or not, which will definitely help the physicians to make the right decisions before performing the surgery for the patient. In this paper, using a publicly available BMT dataset from the University of California, Irvine ML repository (UCI ML repository), different machine learning models were investigated to predict the survival status of children undergoing BMT treatment. In particular, Random Forest (RF), Bagging Classifier, Extreme Gradient Boost (XGBoost), Adaptive Boosting (AdaBoost), Decision Tree (DT), Gradient Boost (GB), and K-Nearest Neighbors (KNN) were trained on the given dataset. The dataset consists of 45 variables after applying a series of preprocessing steps and removing the multicollinearity features based on the correlation heat map. Then, a feature engineering and modelling step was applied to identify the most significant features, followed by the use of machine learning models to simplify the overall classification process. It’s important to note that the most important features obtained by DT and those obtained by GB were the most suitable for training the Bagging classifier and the KNN model, respectively. In addition to that, hyper-parameters optimization using Grid Search Cross-Validation (GSCV) was applied to both approaches to improve the accuracy of the survival prediction. RF, AdaBoost, GB, and Bagging techniques have achieved the best accuracy of 97.37%.

[1] Al-Fayoumi M., Abu Al-Haija Q., Armoush R., and Amareen C., “XAI-PDF: A Robust Framework for Malicious PDF Detection Leveraging SHAP-Based Feature Engineering,” The International Arab Journal of Information Technology, vol. 21, no. 1, pp. 128-146, 2024.

[2] Anelli V., Di Noia T., Di Sciascio E., Pomo C., and Ragone A., “On the Discriminative Power of Hyper-Parameters in Cross-Validation and how to Choose them,” in Proceedings of the 13th ACM Conference on Recommender Systems, Copenhagen, pp. 447-451, 2019.

[3] Belete D. and Huchaiah M., “Grid Search in Hyperparameter Optimization of Machine Learning Models for Prediction of HIV/AIDS Test Results,” International Journal of Computers and Applications, vol. 44, no. 9, pp. 875-886, 2021.

[4] Bentéjac C., Csörgő A., and Martínez-Muñoz G., “A Comparative Analysis of Gradient Boosting Algorithms,” Artificial Intelligence Review, vol. 54, no. 3, pp. 1937-1967, 2021.

[5] Bloehdom S. and Hotho A., “Text Classification by Boosting Weak Learners Based on Terms and Concepts,” in Proceedings of the 4th IEEE International Conference on Data Mining, Brighton, pp. 331-334, 2004.

[6] Boruah A., Biswas S., and Bandyopadhyay S., “Transparent Rule Generator Random Forest: An Interpretable Random Forest,” Evolving Systems, vol. 14, no. 1, pp. 69-83, 2023. DOI:10.1007/s12530-022-09434-4

[7] Chadaga K., Prabhu S., Sampathila N., and Chadaga R., “A Machine Learning and Explainable Artificial Intelligence Approach for Predicting the Efficacy of Hematopoietic Stem Cell Transplant in Pediatric Patients,” Healthcare Analytics, vol. 3, pp. 1-15, 2023.

[8] Choi E., Jun T., Park H., Lee J., Lee K., and Kim Y., “Predicting Long-Term Survival after Allogeneic Hematopoietic Cell Transplantation in Patients with Hematologic Malignancies: Machine Learning-based Model Development and Validation,” JMIR Medical Informatics, vol. 10, no. 3, pp. 1-9, 2022.

[9] Cunningham P. and Delany S., “K-Nearest Neighbour Classifiers-A Tutorial,” ACM Computing Surveys (CSUR), vol. 54, no. 6, pp. 1- 25, 2021.

[10] Dabiri H., Farhangi V., Moradi M., Zadehmohamad M., and Karakouzian M., “Applications of Decision Tree and Random Forest as Tree-based Machine Learning Techniques for Analyzing the Ultimate Strain of Spliced and Non-Spliced Reinforcement Bars,” Applied Sciences, vol. 12, no. 10, pp. 1-13, 2022.

[11] Dangeti P., Statistics for Machine Learning, Packt Publishing, 2017. J

[12] Dasari K. and Devarakonda N., “Detection of DDoS Attacks Using Machine Learning Classification Algorithms,” International Journal of Computer Network and Information Security, vol. 14, no. 6, pp. 89-97, 2022. DOI:

[13] De la Morena M. and Gatti R., “A History of Bone Marrow Transplantation,” Hematology/Oncology Clinics, vol. 25, no. 1, pp. 1-15, 2011. DOI:10.1016/J.HOC.2010.11.001

[14] Gandelman J., Byrne M., Mistry A., Polikowsky H., and Diggins K., “Machine Learning Reveals Survival Prediction of Children after Bone Marrow Transplant Using Machine ... 405 Chronic Graft-Versus-Host Disease Phenotypes and Stratifies Survival after Stem Cell Transplant for Hematologic Malignancies,” Haematologica, vol. 104, no. 1, pp. 189, 2019.

[15] Gourisaria M., Patel A., Chatterjee R., and Sahoo B., “Predicting the Survival Status of Patient after Bone Marrow Transplant Using Fuzzy Discernibility Matrix,” in Proceedings of the OPJU International Technology Conference on Emerging Technologies for Sustainable Development, Raigarh, pp. 1-6, 2023. DOI:10.1109/OTCON56053.2023.10114043

[16] Gratwohl A., Baldomero H., Aljurf M., Pasquini M., and Bouzas L., “Hematopoietic Stem Cell Transplantation: A Global Perspective,” JAMA, vol. 303, no. 16, pp. 1617-1624, 2010. DOI:10.1001/JAMA.2010.491

[17] Guo L., Wang W., Xie X., Wang S., and Zhang Y., “Machine Learning-based Models for genomic Predicting Neoadjuvant Chemotherapeutic Sensitivity in Cervical Cancer,” Biomedicine and Pharmacotherapy, vol. 159, pp. 1-7, 2023.

[18] Gupta V., Braun T., Chowdhury M., Tewari M., and Choi S., “A Systematic Review of Machine Learning Techniques in Hematopoietic Stem Cell Transplantation,” Sensors, vol. 20, no. 21, pp. 1- 19, 2020.

[19] Hasasneh A., Hijazi H., Abu Talib M., Afadar Y., Bou Nassif A., and Nasir Q., “Wearable Devices and Explainable Unsupervised Learning for COVID-19 Detection and Monitoring,” Diagnostics, vol. 13, no. 19, pp. 1-21, 2023.

[20] Hassanijalilian O., Igathinathane C., Bajwa S., and Nowatzki J., “Rating Iron Deficiency in Soybean Using Image Processing and Decision-Tree Based Models,” Remote Sensing, vol. 12, no. 24, pp. 1- 24, 2020.

[21] Hegde R., Prasad K., Hebbar H., Singh B., and Sandhya I., “Automated Decision Support System for Detection of Leukemia from Peripheral Blood Smear Images,” Journal of Digital Imaging, vol. 33, no. 2, pp. 361-374, 2019. DOI:10.1007/S10278-019-00288-Y

[22] Jiang D., Lin W., and Raghavan N., “A Novel Framework for Semiconductor Manufacturing Final Test Yield Classification Using Machine Learning Techniques,” IEEE Access, vol. 8, pp. 197885-197895, 2020.

[23] Jun W. and Liyuan Z., “Brain Tumor Classification Based on Attention Guided Deep Learning Model,” International Journal of Computational Intelligence Systems, vol. 15, no. 1, pp. 1-9, 2022. 022-00090-9

[24] Kharroubi A. and Seir R., Cancer Care in Countries and Societies in Transition, Springer, 2016. 6_6

[25] Lee S., Lee C., Mun K., and Kim D., “Decision Tree Algorithm Considering Distances between Classes,” IEEE Access, vol. 10, pp. 69750-69756, 2022. DOI: 10.1109/ACCESS.2022.3187172

[26] Li J. and Wang R., “An Anomaly Detection Method for Weighted Data Based on Feature Association Analysis,” The International Arab Journal of Information Technology, vol. 21, no. 1, pp. 117-127, 2024.

[27] Malikhah M., Sarno R., and Sabilla S., “Ensemble Learning for Optimizing Classification of Pork Adulteration in Beef Based on Electronic Nose Dataset,” International Journal of Intelligent Engineering and Systems, vol. 14, no. 4, pp. 44- 55, 2021. DOI: 10.22266/ijies2021.0831.05

[28] Mienye I. and Sun Y., “A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects,” IEEE Access, vol. 10, pp. 99129- 99149, 2022.

[29] Mohanapriya M. and Lekha J., “Comparative Study between Decision Tree and KNN of Data Mining Classification Technique,” in Proceedings of the 2nd National Conference on Computational Intelligence, Bangalore, pp. 1-8, 2018. DOI:10.1088/1742-6596/1142/1/012011

[30] Mustofa F., Safriandono A., Muslikh A., and Setiadi R., “Dataset and Feature Analysis for Diabetes Mellitus Classification Using Random Forest,” Journal of Computing Theories and Applications, vol. 1, no. 1, pp. 41-49, 2023.

[31] Navigating Cancer Care Cancer.Net, care/howcancer-treated/bone-marrowstem-cell- transplantation/what-bone-marrow-transplant- stem-cell-transplant, Last Visited, 2024.

[32] Raphael R. and Joy K., “Segmentation and Classification Techniques of Leukemia Using Image Processing: An Overview,” in Proceedings of the International Conference on Intelligent Sustainable Systems, Palladam, pp. 378-384, 2019. DOI:10.1109/ISS1.2019.8907988

[33] Park S., Hamm S., and Kim J., “Performance Evaluation of the GIS-based Data-Mining Techniques Decision Tree, Random Forest, and Rotation Forest for Landslide Susceptibility Modeling,” Sustainability, vol. 11, no. 20, pp. 1- 20, 2019.

[34] Ratul I., Wani U., Nishat M., Al-Monsur A., Ar- Rafi A., Faisal F., and Kabir M., “Survival Prediction of Children Undergoing Hematopoietic 406 The International Arab Journal of Information Technology, Vol. 21, No. 3, May 2024 Stem Cell Transplantation Using Different Machine Learning Classifiers by Performing Chi- squared Test and Hyper-Parameter Optimization: A Retrospective Analysis,” Computational and Mathematical Methods in Medicine, vol. 2022, pp. 1-14, 2022.

[35] Sapra V., Sapra L., Bansal Y., Chhabra G., and Tanwar R., “Machine Learning Approach for Identifying Survival of Bone Marrow Transplant Patients,” in Proceedings of the Emerging Technologies for Computing, Communication and Smart Cities, Delhi, pp. 31-40, 2022. 981-19-0284-0_3

[36] Shouval R., Labopin M., Bondi O., Mishan- Shamay H., Shimoni A., and Ciceri F., “Prediction of Allogeneic Hematopoietic Stem-Cell Transplantation Mortality 100 Days after Transplantation Using a Machine Learning Algorithm: A European Group for Blood and Marrow Transplantation Acute Leukemia Working Party Retrospective Data Mining Study,” Journal of Clinical Oncology, vol. 33, no. 28, pp. 3144-3151, 2015.

[37] Shouval R., Labopin M., Unger R., Giebel S., and Ciceri F., “Prediction of Hematopoietic Stem Cell Transplantation Related Mortality-Lessons Learned from the In-Silico Approach: A European Society for Blood and Marrow Transplantation Acute Leukemia Working Party Data Mining Study,” PLoS One, vol. 11, no. 3, pp. 1-14, 2016. 71/journal.pone.0150637

[38] Sklearn.ensemble.AdaBoostClassifier-scikit- learn 1.3.1 documentation, https://scikit- mble.AdaBoostClassifier.html, Last Visited, 2024.

[39] Styczyński J., Tridello G., Koster L., and Iacobelli S., “Death after Hematopoietic Stem Cell Transplantation: Changes over Calendar Year Time, Infections and Associated Factors,” Bone Marrow Transplantation, vol. 55, no. 1, pp. 126- 136, 2019. DOI:10.1038/s41409-019-0624-z

[40] Taati B., Snoek J., Aleman D., and Ghavamzadeh A., “Data MIning in Bone Marrow Transplant Records to Identify Patients with High Odds of Survival,” IEEE Journal of mBiomed Health Informatics, vol. 18, no. 1, pp. 21-27, 2014. DOI:10.1109/JBHI.2013.2274733

[41] UCI Machine Learning Repository: Bone Marrow Transplant: Children Data Set, ow+transplant%253A+children, Last Visited, 2024.

[42] Xia Y., Liu C., Li Y., and Liu N., “A Boosted Decision Tree Approach Using Bayesian Hyper- Parameter Optimization for Credit Scoring,” Expert Systems with Applications, vol. 78, pp. 225-241, 2017.

[43] Zayed Y., Hasasneh A., and Tadj C., “Infant Cry Signal Diagnostic System Using Deep Learning and Fused Features,” Diagnostics, vol. 13, no. 12, pp. 1-22, 2023. papers, and his research interests include Machine Learning, Deep Learning, Robotics, Feature Extraction, Recognition, Robot Localization, Neurosciences, Image Processing, and Segmentation.