
Hybrid Ensemble Based Machine Learning Approach for Cardiovascular Disease Risk Prediction Using Multiple Integrated Datasets
Cardiovascular Diseases (CVD) are one of the significant reasons for human mortality across the globe. Hence, more accurate and efficient models predicting the early stages of these diseases must be developed. In this research work, an attempt has been made to develop an appropriately huge and heterogeneous dataset after merging the three different datasets from IEEE, UCI and Kaggle sites. Several machine learning algorithms such as Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Extra Trees (ET), Extreme Gradient Boost (XGB), gradient boosting, AdaBoost, and Multi-Layer Perceptron (MLP) have been employed on this integrated dataset. To improve the prediction accuracy even stacked models were employed in order to accomplish the objective of the research. The optimal combination of base models was gradient boosting, ET, and XGB with LR acting as the meta-model, yielding a high accuracy of 99.78% compared to the existing models. Such performances placed the meta-model far from the performance of the other models, which were found to be significantly erroneous in their outputs as compared to the former. This investigation demonstrates how datasets can be effectively merged to improve the generalization potential of a model and how ensemble and stacking methods could be used. The results present a comprehensive approach in building robust CVD prediction systems showing how sophisticated machine learning techniques can enhance Implementation overall decision.
[1] Ahmad G., Fatima H., and Saidi A., “Efficient Medical Diagnosis of Human Heart Diseases Using Machine Learning Techniques with and without GridSearchCV,” IEEE Access, vol. 10, pp. 80151-80173, 2022. https://doi.org/10.1109/ACCESS.2022.3165792
[2] Alam N., Rahman M., Mohi Uddin K., and Akhtar J., “Non-Small Cell Lung Cancer Classification from Histopathological Images Using Feature Fusion and Deep CNN,” International Journal of Engineering and Advanced Technology, vol. 9, no. 5, pp. 1013-1018, 2020. https://doi.org/10.35940/ijeat.E9266.069520
[3] Ashish L., Kumar S., and Yeligeti S., “Ischemic Heart Disease Detection Using Support Vector Machine and Extreme Gradient Boosting Method,” Materials Today Proceedings, 2021. https://doi.org/10.1016/j.matpr.2021.01.715
[4] Benjamin E., Muntner P., Alonso A., Bittencourt M., and et al., “Heart Disease and Stroke Statistics 2019 Update: A Report from the American Heart Association,” Circulation, vol. 139, no. 10, pp. e56-e528, 2019. https://doi.org/10.1161/CIR.0000000000000659
[5] Bharti R., Khamparia A., Shabaz M., Dhiman G., and et al., “Prediction of Heart Disease Using a Combination of Machine Learning and Deep Learning,” Computational Intelligence and Neuroscience, vol. 2021, no. 1, pp. 1-11, 2021. https://doi.org/10.1155/2021/8387680
[6] Bhukya R., “Encoding Gene Expression Using Deep Autoencoders for Expression Inference,” The International Arab Journal of Information Technology, vol. 18, no. 5, pp. 625-633, 2021. https://doi.org/10.34028/iajit/18/5/1
[7] Bhukya R., Kumari A., Dasari C., and Amilpur S., “An Attention-Based Hybrid Deep Neural Networks for Accurate Identification of Transcription Factor Binding Sites,” Neural Computing and Applications, vol. 34, no. 21, pp. 19051-19060, 2022. https://doi.org/10.1007/s00521-022-07502-z
[8] Biswas N., Mohi Uddin K., Rikta S., and Dey S., “A Comparative Analysis of Machine Learning Classifiers for Stroke Prediction: A Predictive Analytics Approach,” Healthcare Analytics, vol. 2, pp. 1-14, 2022. https://doi.org/10.1016/j.health.2022.100116
[9] Brook R., Rajagopalan S., Pope C., Brook J., and et al., “Particulate Matter Air Pollution and Cardiovascular Disease: An Update to the Scientific Statement from the American Heart Association,” Circulation, vol. 121, no. 21, pp. 2331-2378, 2010. https://doi.org/10.1161/CIR.0b013e3181dbece1
[10] Chattopadhyay A. and Maitra M., “MRI-Based Brain Tumor Image Detection Using CNN Based Deep Learning Method,” Neuroscience Informatics, vol. 2, no. 4, pp. 1-6, 2022. https://doi.org/10.1016/j.neuri.2022.100060
[11] Chen T. and Guestrin C., “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining, San Francisco, pp. 785-794, 2016. https://doi.org/10.1145/2939672.2939785
[12] Cho N., Shaw J., Karuranga S., Huang Y., and et al., “IDF Diabetes Atlas: Global Estimates of Diabetes Prevalence for 2017 and Projections for 2045,” Diabetes Research and Clinical Practice, vol. 138, pp. 271-281, 2018. https://pubmed.ncbi.nlm.nih.gov/29496507/
[13] Dasari C. and Bhukya R., “Explainable Deep Neural Networks for Novel Viral Genome Prediction,” Applied Intelligence, vol. 52, no. 3, pp. 3002-3017, 2022. https://link.springer.com/article/10.1007/s10489- 021-02572-3
[14] Dey S., Rahman M., Howlader A., Siddiqi U., and et al., “Prediction of Dengue Incidents Using Hospitalized Patients, Metrological and Socio- Economic Data in Bangladesh: A Machine Learning Approach,” PLoS One, vol. 17, no. 7, pp. 1-17, 2022. https://doi.org/10.1371/journal.pone.0270933
[15] Ezzati M. and Riboli E., “Behavioral and Dietary Risk Factors for Noncommunicable Diseases,” New England Journal of Medicine, vol. 369, no. 10, pp. 954-964, 2013. https://www.nejm.org/doi/full/10.1056/NEJMra1 203528
[16] Gaziano T., Bitton A., Anand S., Gessel S., and Murphy A., “Growing Epidemic of Cardiovascular Disease in Low- and Middle- Income Countries,” Current Problems in Cardiology; vol. 35, no. 2, pp. 72-115, 2010. https://doi.org/10.1016/j.cpcardiol.2009.10.002
[17] Gorelick P., Scuteri A., Black S., Decarli C., and et al., “Vascular Contributions to Cognitive Impairment and Dementia: A Statement for Healthcare Professionals from the American Heart Association/American Stroke Association,” Stroke, vol. 42, no. 9, pp. 2672-2713, 2011. https://doi.org/10.1161/str.0b013e3182299496
[18] Gugulothu P. and Bhukya R., “Coot-Lion Optimized Deep Learning Algorithm for COVID- 19 Point Mutation Rate Prediction Using Genome 206 The International Arab Journal of Information Technology, Vol. 23, No. 2, March 2026 Sequences,” Computer Methods in Biomechanics and Biomedical Engineering, vol. 27, no. 11, pp. 1410-1429, 2023. https://doi.org/10.1080/10255842.2023.2244109
[19] Heart Disease Prediction, https://www.kaggle.com/datasets/durgesh2050/he art-disease-predication?select=heart, Last Visited, 2025.
[20] Hertel R. and Benlamri R., “A Deep Learning Segmentation-Classification Pipeline for X-Ray- based Covid-19 Diagnosis,” Biomedical Engineering Advances, vol. 3, pp. 1-14, 2022. https://doi.org/10.1016/j.bea.2022.100041
[21] Janosi A., Steinbrunn W., Pfisterer M., and Detrano R., Heart Disease Data Set, https://archive.ics.uci.edu/ml/datasets/Heart+Dise ase, Last Visited, 2025.
[22] Kartheek M., Prasad M., and Bhukya R., “Texture Based Feature Extraction Using Symbol Patterns for Facial Expression Recognition,” Cognitive Neurodynamics, vol. 18, pp. 317-335, 2024. https://doi.org/10.1007/s11571-022-09824-z
[23] Kataria R. and Meena S., “Machine Learning Techniques for Heart Disease Prediction: A Comparative Study and Analysis.” Health Technology, vol. 11, pp. 87-97, 2021. https://doi.org/10.1007/s12553-020-00505-7
[24] Kavitha M., Gnaneswar G., Dinesh R., Sai Y., Suraj R., “Heart Disease Prediction Using Hybrid Machine Learning Model,” in Proceedings of the 6th International Conference on Inventive Computation Technologies, Coimbatore, pp. 1329-1333, 2021. https://doi.org/10.1109/ICICT50816.2021.93585 97
[25] Ke G., Meng Q., Finley T., Wang, T., and et al., “LightGBM: A Highly Efficient Gradient Boosting Decision Tree,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, California, pp. 3149-3157, 2017. https://dl.acm.org/doi/10.5555/3294996.3295074
[26] Khera A. and Kathiresan S., “Genetics of Coronary Artery Disease: Discovery, Biology and Clinical Translation,” Nature Reviews Genetics, vol. 18, no. 6, pp. 331-344, 2017. https://www.nature.com/articles/nrg.2016.160
[27] Ladecola C., Yaffe K., Biller J., Bratzke L., and et al., “Impact of Hypertension on Cognitive Function: A Scientific Statement from the American Heart Association,” Hypertension, vol. 68, no. 6, pp. 67-94, 2016. https://doi.org/10.1161/HYP.0000000000000053
[28] Lear S., Hu W., Rangarajan S., Gasevic D., and et al., “The Effect of Physical Activity on Mortality and Cardiovascular Disease in 130,000 People from 17 High-Income, Middle-Income, and Low- Income Countries: The PURE Study,” The Lancet, vol. 390, no. 10113, pp. 2643-2654, 2017. https://doi.org/10.1016/s0140-6736(17)31634-3
[29] Liao H., Fang R., Yang J., and Xu D., “A Linguistic Belief-based Evidential Reasoning Approach and its Application in Aiding Lung Cancer Diagnosis,” Knowledge-Based Systems, vol. 253, pp. 109559, 2022. https://doi.org/10.1016/j.knosys.2022.109559
[30] Maas A. and Appelman Y., “Gender Differences in Coronary Heart Disease,” Netherlands Heart Journal, vol. 18, no. 12, pp. 598-602, 2010. https://doi.org/10.1007/s12471-010-0841-y
[31] Maini E., Venkateswarlu B., and Gupta A., “Applying Machine Learning Algorithms to Develop a Universal Cardiovascular Disease Prediction System,” International Conference on Intelligent Data Communication Technologies and Internet of Things, Coimbatore, pp. 627-32, 2018. https://doi.org/10.1007/978-3-030-03146- 6_69
[32] Mamatha S., Krishnappa H., Ullah S., and Shalini N., “Graph Theory Based Segmentation of Magnetic Resonance Images for Brain Tumor Detection,” Pattern Recognition and Image Analysis; vol. 32, no. 1, pp. 153-61, 2022. https://doi.org/10.1134/S1054661821040167
[33] Mohi Uddin K., Ripaa R., Yeasmin N., Biswas N., and Dey S., “Machine Learning-based Approach to the Diagnosis of Cardiovascular Vascular Disease Using a Combined Dataset,” Intelligence- Based Medicine, vol. 7, pp. 1-15, 2023. https://doi.org/10.1016/j.ibmed.2023.100100
[34] Mozaffarian D., Fahimi S., Singh G., Micha R., and et al., “Global Sodium Consumption and Death from Cardiovascular Causes,” New England Journal of Medicine, vol. 371, no. 7, pp. 624-634, 2014. https://www.nejm.org/doi/full/10.1056/NEJMoa1 304127
[35] Oliveira D., Silva J., Araujo T., and Albuquerque U., “Influence of Religiosity and Spirituality on the Adoption of Behaviors of Epidemiological Relevance in Emerging and Re-Emerging Diseases: The Case of Dengue Fever,” Journal of Religion and Health, vol. 61, no. 1, pp. 564-85, 2022. https://doi.org/10.1007/s10943-021-01436-x
[36] Rahman M., Rana M., Munna N., Khan S., and Mohi Uddin K., “A Web-based Heart Disease Prediction System Using Machine Learning Algorithms,” Network Biology, vol. 12, no. 2, pp. 64-81, 2022. file:///C:/Users/user/Downloads/web-based-heart- disease-prediction-system.pdf
[37] Roth G., Johnson C., and Abate K., “The Burden of Cardiovascular Diseases Among US States, 1990-2016,” JAMA Cardiology, vol. 3, no. 5, pp. 375-389, 2018. DOI:10.1001/jamacardio.2018.0385 Hybrid Ensemble Based Machine Learning Approach for Cardiovascular Disease Risk … 207
[38] Shah D., Patel S., and Bharti S., “Heart Disease Prediction Using Machine Learning Techniques,” SN Computer Science, vol. 1, no. 6, pp. 2661- 8907, 2020. https://doi.org/10.1007/s42979-020- 00365-y
[39] Siddhartha M., Heart Disease Dataset, https://doi.org/10.21227/dz4t-cm36, Last Visited, 2025.
[40] The Lancet, Worldwide Trends in Diabetes Since 1980: A Pooled Analysis of 751 Population-based Studies with 4.4 Million Participants, https://doi.org/10.1016/s0140-6736(16)00618-8, Last Visited, 2025.
[41] Usman M., Ali S., Samad A., Abrar M., and et al., “A Method for Improving Prediction of Human Heart Disease Using Machine Learning Algorithms,” Mobile Information Systems, vol. 2022, no. 1, pp. 1-9, 2022. https://doi.org/10.1155/2022/1410169
[42] Weng S., Reps J., Kai J., Garibaldi J., and Qureshi N., “Can Machine-Learning Improve Cardiovascular Risk Prediction Using Routine Clinical Data?,” PLoS ONE, vol. 12, no. 4, pp. 1- 14, 2017. https://doi.org/10.1371/journal.pone.0174944
[43] World Health Organization (WHO), Cardio Vascular Diseases (CVDs) Key Facts, https://www.who.int/news-room/fact- sheets/detail/cardio, Last Visited, 2025.
[44] Yusuf S., Joseph P., Rangarajan S., Islam S., and et al., “Modifiable Risk Factors, Cardiovascular Disease, and Mortality in 155,722 Individuals from 21 High-Income, Middle-Income, and Low- Income Countries (PURE): A Prospective Cohort Study,” The Lancet, vol. 395, no. 10226, pp. 795- 808, 2019. https://www.thelancet.com/journals/lancet/article/ PIIS0140-6736(19)32008-2/abstract