The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Heart Disease Diagnosis Using Decision Trees with Feature Selection Method

The advancement in treating medical data grows significantly daily. An accurate data classification model can help determine patient disease and diagnose disease severity in the medical domain, thus easing doctors' treatment burdens. Nonetheless, medical data analysis presents challenges due to uncertainty, the correlations between various measurements, and the high dimensionality of the data. These challenges burden statistical classification models. Machine Learning (ML) and data mining approaches have proven effective in recent years in gaining a deeper understanding of the importance of these aspects. This research adopts a well-known supervised learning classification model named a Decision Tree (DT). DT is a typical tree structure consisting of a central node, connected branches, and internal and terminal nodes. In each node, we have a decision to be made, such as in a rule-based system. This type of model helps researchers and physicians better diagnose a disease. To reduce the complexity of the proposed DT, we explored using the Feature Selection (FS) method to design a simpler diagnosis model with fewer factors. This concept will help reduce the data collection stage. A comparative analysis has been conducted between the developed DT and other various ML models, such as Logistic Regression (LR), Support Vector Machine (SVM), and Gaussian Naive Bayes (GNB), to demonstrate the effectiveness of the developed model. The results of the DT model establish a notable accuracy of 93.78% and an ROC value of 0.94, which beats other compared algorithms. The developed DT model provided promising results and can help diagnose heart disease.

[1] Agbemade E., “Predicting Heart Disease Using Tree-based Model,” Data Science and Data Mining, University of Central Florida, 2023. https://stars.library.ucf.edu/cgi/viewcontent.cgi?a rticle=1000&context=data-science-mining [2] Ali M., Paul B., Ahmed K., Bui F., Quinn J., and Moni M., “Heart Disease Prediction Using Supervised Machine Learning Algorithms: Performance Analysis and Comparison,” Computers in Biology and Medicine, vol. 136, pp. 104672, 2021. DOI:10.1016/j.compbiomed.2021.104672 [3] American Heart Association, Heart Disease and Stroke Statistics Update Fact Sheet at-a-Glance- 2024, https://www.heart.org/-/media/PHD-Files- 2/Science-News/2/2024-Heart-and-Stroke-Stat- Update/2024-Statistics-At-A-Glance- final_2024.pdf, Last Visited, 2024. [4] Anderies A., Tchin J., Putro P., Darmawan Y., and Gunawan A., “Prediction of Heart Disease UCI Dataset Using Machine Learning Algorithms,” Engineering, Mathematics and Computer Science Journal, vol. 4, no. 3, pp. 87-93, 2022. https://doi.org/10.21512/emacsjournal.v4i3.8683 [5] Bhatt C., Patel P., Ghetia T., and Mazzeo P., “Effective Heart Disease Prediction Using Machine Learning Techniques,” Algorithms, vol. 16, no. 2, pp. 1-14, 2023. https://doi.org/10.3390/a16020088 436 The International Arab Journal of Information Technology, Vol. 21, No. 3, May 2024 [6] Biswas N., Ali M., Abdur Rahman M., Islam M., Mia M., Azam S., Ahmed K., Bui F., Al-Zahrani F., and Moni M., “Machine Learning-based Model to Predict Heart Disease in early Stage Employing Different Feature Selection Techniques,” BioMed Research International, vol. 2023, pp. 1-15, 2023. https://doi.org/10.1155/2023/6864343 [7] Bond K. and Sheta A., “Medical Data Classification Using Machine Learning Techniques,” International Journal of Computer Applications, vol. 183, no. 6, 2021. DOI:10.5120/ijca2021921339 [8] Boukhatem C., Youssef H., and Nassif A., “Heart Disease Prediction Using Machine Learning,” in Proceedings of the Advances in Science and Engineering Technology International Conferences, Dubai, pp. 1-6, 2022. https://ieeexplore.ieee.org/document/9734880 [9] Breiman L., Friedman J., Olshen R., and Stone C., Classification and Regression Trees, Chapman and Hall/CRC, 1984. https://doi.org/10.1201/9781315139470 [10] Chen C., Tsai Y., Chang F., and Lin W., “Ensemble Feature Selection in Medical Datasets: Combining Filter, Wrapper, and Embedded Feature Selection Results,” Expert Systems, vol. 37, no. 5, pp. e12553, 2020. https://doi.org/10.1111/exsy.12553 [11] Chrimes D., “Using Decision Trees as an Expert System for Clinical Decision Support for Covid- 19,” Interactive Journal of Medical Research, vol. 12, no. 1, pp. 1-12, 2023. https://pubmed.ncbi.nlm.nih.gov/36645840/ [12] Cios K., Medical Data Mining and Knowledge Discovery, Springer, 2001. https://link.springer.com/book/9783790813401 [13] Dissanayake K. and Johar M., “Comparative Study on Heart Disease Prediction Using Feature Selection Techniques on Classification Algorithms,” Applied Computational Intelligence and Soft Computing, vol. 2021, pp. 1-17, 2021. https://doi.org/10.1155/2021/5581806 [14] Dua D. and Graff C., UCI Machine Learning Repository-2017, http://archive.ics.uci.edu/ml, Last Visited, 2024. [15] Elbasi E. and Zreikat A., “Heart Disease Classification for Early Diagnosis Based on Adaptive Hoeffding Tree Algorithm in IoMT Data,” The International Arab Journal of Information Technology, vol. 20, no. 1, pp. 38-48, 2023. https://doi.org/10.34028/iajit/20/1/5 [16] Franklin R. and Muthukumar B., “Survey of Heart Disease Prediction and Identification Using Machine Learning Approaches,” in Proceedings of the 3rd International Conference on Intelligent Sustainable Systems, Thoothukudi, pp. 553-557, 2020. DOI: 10.1109/ICISS49785.2020.9316119 [17] Heart Disease Dataset on Kaggle, https://www.kaggle.com/datasets/johnsmith88/he art-disease-dataset, Last Visited, 2024. [18] Janosi A., Steinbrunn W., Pfisterer M., Detrano R., UCI Machine Learning, Heart Disease Repository-1988, https://doi.org/10.24432/C52P4X, Last Visited, 2024. [19] Kavitha M., Gnaneswar G., Dinesh R., Sai Y., and Suraj R., “Heart Disease Prediction Using Hybrid Machine Learning Model,” in Proceedings of the 6th International Conference on Inventive Computation Technologies, Coimbatore, pp. 1329-1333, 2021. https://ieeexplore.ieee.org/document/9358597 [20] Kodati S. and Vivekanandam R., “Analysis of Heart Disease Using in Data Mining Tools Orange and Weka,” Global Journal of Computer Science and Technology, vol. 18, no. C1, pp. 17-21, 2018. https://globaljournals.org/GJCST_Volume18/4- Analysis-of-Heart-Disease.pdf [21] Krishnan S. and Geetha S., “Prediction of Heart Disease Using Machine Learning Algorithms,” in Proceedings of the 1st International Conference on Innovations in Information and Communication Technology, Chennai, pp. 1-5, 2019. DOI: 10.1109/ICIICT1.2019.8741465 [22] Li J., Ul Haq A., Ud Din S., Khan J., Khan A., and Saboor A., “Heart Disease Identification Method Using Machine Learning Classification in E- Healthcare,” IEEE Access, vol. 8, pp. 107562- 107582, 2020. DOI:10.1109/ACCESS.2020.3001149 [23] Maheswari S. and Pitchai R., “Heart Disease Prediction System Using Decision Tree and Naive Bayes Algorithm,” Current Medical Imaging Reviews, vol. 15, no. 8, pp. 712-717, 2019. DOI:10.2174/1573405614666180322141259 [24] Maji S. and Arora S., Decision tree Algorithms for Prediction of Heart Disease, Springer, 2019. https://link.springer.com/chapter/10.1007/978- 981-13-0586-3_45 [25] Mc Namara K., Alzubaidi H., and Jackson J., “Cardiovascular Disease as a Leading Cause of Death: How are Pharmacists Getting Involved?,” Integrated Pharmacy Research and Practice, vol. 8, pp. 1-11, 2019. DOI:10.2147/IPRP.S133088/ [26] Miao J. and Niu L., “A Survey on Feature Selection,” in Proceedings of the 4th International Conference on Information Technology and Quantitative Management, Promoting Business Analytics and Quantitative Management of Technology, Seoul, pp. 919-926, 2016. https://doi.org/10.1016/j.procs.2016.07.111 [27] Münzel T., Hahad O., Sørensen M., Lelieveld J., Duerr G., Nieuwenhuijsen M., and Daiber A., “Environmental Risk Factors and Cardiovascular Diseases: A Comprehensive Expert Review,” Cardiovascular Research, vol. 118, no. 14, pp. Heart Disease Diagnosis Using Decision Trees with Feature Selection Method 437 2880-2902, 2022. https://doi.org/10.1093/cvr/cvab316 [28] Nawsherwan., Wang B., Zhang L., Sumaira M., Guo F., and Yan W., “Prediction of Cardiovascular Diseases Mortality and Disability- Adjusted Life-Years Attributed to Modifiable Dietary Risk Factors from 1990 to 2030 among East Asian Countries and the World,” Frontiers in Nutrition, vol. 9, pp. 1-12, 2022. DOI:10.3389/fnut.2022.898978 [29] Nichenametla R., Maneesha T., Hafeez S., and Krishna H., “Prediction of Heart Disease Using Machine Learning Algorithms,” International Journal of Engineering and Technology, vol. 7, no. 5, pp. 363-366, 2018. DOI:10.14419/ijet.v7i2.32.15714 [30] Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., and Grisel O., “Scikit-Learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011. http://www.jmlr.org/papers/volume12/pedregosa 11a/pedregosa11a.pdf [31] Podgorelec V., Kokol P., Stiglic B., and Rozman I., “Decision Trees: An Overview and their Use in Medicine,” Journal of Medical Systems, vol. 26, no. 5, pp. 445-463, 2002. DOI:10.1023/a:1016409317640 [32] Prather J., Lobach D., Goodwin L., Hales J., Hage M., and Hammond W., “Medical Data Mining: Knowledge Discovery in a Clinical Data Warehouse,” in Proceedings of the AMIA Annual Fall Symposium, Nashville, pp. 101-105, 1997. https://www.ncbi.nlm.nih.gov/pmc/issues/160771/ [33] Pudjihartono N., Fadason T., Kempa-Liehr A., and O’Sullivan J., “A Review of Feature Selection Methods for Machine Learning-based Disease Risk Prediction,” Frontiers in Bioinformatics, vol. 2, pp. 1-17, 2022. https://doi.org/10.3389/fbinf.2022.927312 [34] Purushottam., Saxena K., and Sharma R., “Efficient Heart Disease Prediction System Using Decision Tree,” in Proceedings of the International Conference on Computing, Communication Automation, Greater Noida, pp. 72-77, 2015. https://ieeexplore.ieee.org/document/7148346 [35] Qiao Q., Yunusa-Kaltungo A., and Edwards R., “Feature Selection Strategy for Machine Learning Methods in Building Energy Consumption Prediction,” Energy Reports, vol. 8, pp. 13621- 13654, 2022. https://doi.org/10.1016/j.egyr.2022.10.125 [36] Quinlan J., “Induction of Decision Trees,” Machine Learning, vol. 1, no. 1, pp. 81-106, 1986. https://link.springer.com/article/10.1007/BF0011 6251 [37] Radhika R. and George S., “Heart Disease Classification Using Machine Learning Techniques,” in Proceedings of the International Conference on Novel Approaches and Developments in Biomedical Engineering, Coimbatore, pp. 1-9, 2021. DOI: 10.1088/1742- 6596/1937/1/012047 [38] Rani P., Gujral R., Sid Ahmed N., and Jain A., “A Decision Support System for Heart Disease Prediction Based upon Machine Learning,” Journal of Reliable Intelligent Environments, vol. 7, no. 3, pp. 263-275, 2021. https://link.springer.com/article/10.1007/s40860- 021-00133-6 [39] Repaka A., Ravikanti S., and Franklin R., “Design and Implementing Heart Disease Prediction Using Naives Bayesian,” in Proceedings of the 3rd International Conference on Trends in Electronics and Informatics, Tirunelveli, pp. 292-297, 2019. DOI:10.1109/ICOEI.2019.8862604 [40] Shah D., Patel S., and Bharti S., “Heart Disease Prediction Using Machine Learning Techniques,” SN Computer Science, vol. 1, no. 6, pp. 345, 2020. https://doi.org/10.1007/s42979-020-00365-y [41] Sharma H. and Kumar S., “A Survey on Decision Tree Algorithms of Classification in Data Mining,” International Journal of Science and Research, vol. 5, no. 4, pp. 2094-2097, 2016. file:///C:/Users/user/Downloads/1221663df46568 d5e1edf3e0476d1d2422cc.pdf [42] Shilpa K. and Adilakshmi T., “An Enhanced Machine Learning Technique to Predict Heart Disease,” in Proceedings of the 2nd International Conference on Cognitive and Intelligent Computing, Hyderabad, pp. 177-183, 2023. https://link.springer.com/chapter/10.1007/978- 981-99-2742-5_19 [43] Shouman M., Turner T., and Stocker R., “Using Data Mining Techniques in Heart Disease Diagnosis and Treatment,” in Proceedings of the Japan-Egypt Conference on Electronics, Communications and Computers, Alexandria, pp. 173-177, 2012. DOI:10.1109/JEC- ECC.2012.6186978 [44] Shouman M., Turner T., and Stocker R., “Using Decision Tree for Diagnosing Heart Disease Patients,” in Proceedings of the 9th Australasian Data Mining Conference: Australian Computer Society, Ballara, pp. 23-30, 2011. https://dl.acm.org/doi/pdf/10.5555/2483628.2483 633 [45] Spencer R., Thabtah F., Abdelhamid N., and Thompson M., “Exploring Feature Selection and Classification Methods for Predicting Heart Disease,” Digital Health, vol. 6, no, 2, pp. 1-10, 2020. DOI:10.1177/2055207620914777 [46] Suresh S., Newton D., Everett T., Lin G., and Duerstock B., “Feature Selection Techniques for a Machine Learning Model to Detect Autonomic Dysreflexia,” Frontiers in Neuroinformatics, vol. 438 The International Arab Journal of Information Technology, Vol. 21, No. 3, May 2024 16, pp. 1-10, 2022. DOI:10.3389/fninf.2022.901428 [47] Tu M., Shin D., and Shin D., “Effective Diagnosis of Heart Disease through Bagging Approach,” in Proceedings of the 2nd International Conference on Biomedical Engineering and Informatics, Tianjin, pp. 1-4, 2009. DOI:10.1109/BMEI.2009.5301650 [48] UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/index.php, Last Visited, 2024. [49] Vijaya Saraswathi R., Gajavelly K., Kousar Nikath A., Vasavi R., and Reddy Anumasula R., “Heart Disease Prediction Using Decision Tree and SVM,” in Proceedings of the 2nd International Conference on Advances in Computer Engineering and Communication Systems, Hyderabad, pp. 69-78, 2022. https://link.springer.com/chapter/10.1007/978- 981-16-7389-4_7#citeas [50] Yang J., Li Y., Liu Q., Li L., Feng A., Wang T., Zheng S., Xu A., and Lyu J., “Brief Introduction of Medical Database and Data Mining Technology in the Big Data Era,” Journal of Evidence-based Medicine, vol. 13, no. 1, pp. 57-69, 2020. https://doi.org/10.1111/jebm.12373 Alaa Sheta obtained his Ph.D. in Information Technology from George Mason University, Virginia, USA, in 1997. Before that, he earned his B.E. and M.Sc. degrees in Electronics and Communication Engineering from Cairo University, Egypt in 1988 and 1994, respectively. He holds a tenured professorship at the Computer Science Department of Southern Connecticut State University in New Haven, Connecticut, USA. Dr. Sheta’s research interests span various areas, including machine and deep learning, meta-heuristics, data science, image processing, and robotics. He has significantly contributed to these fields, publishing more than 170 papers in international journals and conferences. In addition to his scholarly achievements, he has actively participated in various capacities, such as serving as a chair, guest editor, and program committee member for numerous international events. Walaa El-Ashmawi is an Associate Professor at the Faculty of Computer Science, Misr International University, on leave from Suez Canal University, Egypt. She received her B.Sc. in 2001 and M.Sc. in 2008 from Egypt. She got her Ph.D. She earned her degree in 2013 from the Computer Science Department, College of Information Science and Engineering, Hunan University, China. Her research interests Span Various Domains in Computer Science including Artificial and Computational Intelligence, Meta-heuristic algorithms Optimization problems, and Machine Learning techniques. She authorizes over 40 published papers in various international journals and conferences. She led different administrative and academic positions, both full-time and part-time. AbdelKarim Baareh holds a Doctor of Informatics/Artificial Intelligence degree from Damascus University, Syria, which he earned in 2009. He also completed his B.Sc. and M.C.A (Computer Application) degrees from Mysore and Bangalore University, India, in 1992 and 1999, respectively. Professor Baara is on sabbatical leave and affiliated with the Data Science and Artificial Intelligence Department at the Information Technology College, Isra University in Jordan. Additionally, he holds a permanent academic position within the Computer Science Department at Al- Balqa Applied University, Jordan. Dr. Baara’s extensive research portfolio spans various domains, including meta-heuristics, global optimization, machine learning, data mining, bioinformatics, graph theory, and parallel programming. Over the years, he has contributed significantly to his field, with a publication record that boasts over 35 papers in international journals and conferences.