Ensemble machine learning methods for predictive diagnostics of cardiovascular diseases: comparative analysis on a multi-center dataset

idLavier C.M., idVeselov D.I., idAndriyanov N.A.

UDC 004.852:616.12
DOI: 10.26102/2310-6018/2026.57.6.017

Abstract
List of references
About authors

Eight machine learning algorithms for cardiovascular disease diagnosis were compared on a combined multi-center dataset from six databases (n = 1.904). Three clinically motivated derived features were proposed: maxhrratio (ratio of maximum heart rate to age-predicted maximum), sthr index (ratio of ST-segment depression to maximum heart rate), and anginast flag (binary indicator of co-occurring typical angina and downsloping ST segment). Base algorithms – decision tree, logistic regression, random forest, XGBoost, CatBoost, LightGBM – were trained with Bayesian hyperparameter optimization (Optuna). Ensembling was performed via stacking (out-of-fold predictions, meta-learner with Platt calibration) and AUC-weighted soft voting. Performance was assessed using BCa bootstrap (10,000 iterations, 95 % CI); pairwise comparisons used DeLong and McNemar tests with Bonferroni correction (28 pairs, p < 0.00179). CatBoost achieved the best single-model ROC-AUC = 0.948 [0.922–0.966], F1 = 0.884, Brier = 0.097. Stacking reached ROC-AUC = 0.931 with the best ensemble calibration (Brier = 0.102). Ablation study showed that seven features retain 97.5 % of full-model performance. SHAP consensus across four models ranked sthr index fourth among 14 features, ahead of seven original clinical variables. Leave-one-source-out validation revealed encoding incompatibilities in two of six sources, underscoring the need for data auditing prior to cross-institutional deployment.

1. Gusev A.V. Prospects for neural networks and deep machine learning in creating health solutions. Medical Doctor and IT. 2017;(3):92–105. (In Russ.).

2. Gusev A.V., Novitskiy R.E., Ivshin A.A., Alekseev A.A. Machine learning based on laboratory data for disease prediction. FARMAKOEKONOMIKA. Modern Pharmacoeconomics and Pharmacoepidemiology. 2021;14(4):581–592. (In Russ.). https://doi.org/10.17749/2070-4909/farmakoekonomika.2021.115

3. Kiselev A.A. Development of a machine learning model for predicting cardiovascular diseases. Symbol of Science. 2023;(1-1):9–12. (In Russ.).

4. Mamedov M.N., Savchuk E.A., Karimov A.K. Artificial intelligence in cardiology. International Heart and Vascular Disease Journal. 2024;12(43):5–11. (In Russ.).

5. Belenkov Yu.N., Kozhevnikova M.V., Khabarova N.V., Ilgisonis I.S., Korobkova E.O. The role of artificial intelligence in cardiology. Kardiologiia. 2025;65(2):3–16. (In Russ.). https://doi.org/10.18087/cardio.2025.2.n2879

6. Geltser B.I., Tsivanyuk M.M., Shakhgeldyan K.I., Rublev V.Yu. Machine learning as a tool for diagnostic and prognostic research in coronary artery disease. Russian Journal of Cardiology. 2020;25(12). (In Russ.). https://doi.org/10.15829/1560-4071-2020-3999

7. Geltser B.I., Rublev V.Yu., Tsivanyuk M.M., Shakhgeldyan K.I. Machine learning in predicting immediate and long-term outcomes of myocardial revascularization: a systematic review. Russian Journal of Cardiology. 2021;26(8). (In Russ.). https://doi.org/10.15829/1560-4071-2021-4505

8. Kaledina E.A., Kaledin O.E., Kulyagina T.I. Applying machine learning for prediction of cardiovascular diseases on small data sets. Problems of Informatics. 2022;(1):66–76. (In Russ.). https://doi.org/10.24412/2073-0667-2022-1-66-76

9. Prokhorenkova L., Gusev G., Vorobev A., Dorogush A.V., Gulin A. CatBoost: unbiased boosting with categorical features. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018 (NeurIPS 2018), 03–08 December 2018, Montréal, Canada. 2018. P. 6639–6649.

10. Dorogush A.V., Ershov V., Gulin A. CatBoost: gradient boosting with categorical features support. arXiv. URL: https://arxiv.org/abs/1810.11363 [Accessed 20th April 2026].

11. Chen T., Guestrin C. XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13–17 August 2016, San Francisco, CA, USA. New York: ACM; 2016. P. 785–794. https://doi.org/10.1145/2939672.2939785

12. Ke G., Meng Q., Finley Th., et al. LightGBM: A highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 04–09 December 2017, Long Beach, CA, USA. 2017. P. 3146–3154.

13. Wolpert D.H. Stacked generalization. Neural Networks. 1992;5(2):241–259. https://doi.org/10.1016/S0893-6080(05)80023-1

14. Platt J.C. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers. Cambridge: MIT Press; 1999. P. 61–74.

15. DiCiccio Th.J., Efron B. Bootstrap confidence intervals. Statistical Science. 1996;11(3):189–228.

16. DeLong E.R., DeLong D.M., Clarke-Pearson D.L. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics. 1988;44(3):837–845.

17. Pencina M.J., D'Agostino R.B., D'Agostino R.B., Vasan R.S. Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond. Statistics in Medicine. 2008;27(2):157–172. https://doi.org/10.1002/sim.2929

18. Lundberg S.M., Lee S.-I. A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 04–09 December 2017, Long Beach, CA, USA. 2017. P. 4765–4774.

19. Hanov A.M., Gusev A.V., Tyurganov A.G. Artificial intelligence in Russian healthcare: collecting and preparing data for machine learning. Journal of Telemedicine and E-Health. 2023;9(4):7–13. (In Russ.). https://doi.org/10.29188/2712-9217-2023-9-4-7-13

20. Geltser B.I., Shakhgeldyan K.I., Rublev V.Yu., et al. Phenotyping of risk factors and prediction of inhospital mortality in patients with coronary artery disease after coronary artery bypass grafting based on explainable artificial intelligence methods. Russian Journal of Cardiology. 2023;28(4). (In Russ.). https://doi.org/10.15829/1560-4071-2023-5302

21. Soloviev I.A., Kurochkina O.N. Artificial intelligence applications in cardiology: A review. Russian Journal of Cardiology. 2024;29(11S). (In Russ.). https://doi.org/10.15829/1560-4071-2024-5673

Lavier Casey Markovich

ORCID |

Moscow Witte University

Moscow, Russian Federation

Veselov Dmitriy Ivanovich

ORCID |

Financial University under the Government of the Russian Federation

Moscow, Russian Federation

Andriyanov Nikita Andreevich
Candidate of Engineering Sciences, Docent

ORCID | eLibrary |

Financial University under the Government of the Russian Federation

Moscow, Russian Federation

Keywords: machine learning, cardiovascular disease, catBoost, stacking, SHAP, BCa bootstrap, NRI, IDI, multi-center dataset, feature engineering

For citation: Lavier C.M., Veselov D.I., Andriyanov N.A. Ensemble machine learning methods for predictive diagnostics of cardiovascular diseases: comparative analysis on a multi-center dataset. Modeling, Optimization and Information Technology. 2026;14(6). URL: https://moitvivt.ru/ru/journal/article?id=2302 DOI: 10.26102/2310-6018/2026.57.6.017 (In Russ).

Full text in PDF

Скачать JATS XML

Received 20.03.2026

Revised 15.06.2026

Accepted 22.06.2026

Published 30.06.2026