Keywords: bayesian approach, random forest, ensembles of models, voiting, stacking, geroprophylactic effect, predicting the effectiveness of treatment, bio-growth
Building gender- and age-dependent models for assessing bio-age based on the functional data of the patient's body
UDC 51-76
DOI: 10.26102/2310-6018/2024.45.2.012
Machine learning methods are widely used to build medical predictive models. At the same time, along with methods based on classical statistics, Bayesian methods are used, which are most effective for small sample sizes. In this paper, a number of models for predicting the patient's bio-age based on his functional data using both classical machine learning methods and the Bayesian approach are constructed. The data used were the results of clustering that we carried out earlier in a previous study on the material of medical organizations “Sverdlovsk Regional Clinical Psychoneurological Hospital for War Veterans” and “Institute of Medical Cell Technologies” for 1995–2022 in a volume of 6440 records, where 4 clusters were obtained, divided by gender and patient status (inpatient and outpatient). Based on the assumption that patients in outpatient status have the smallest difference in biological and calendar age, and therefore make less error in the accuracy of the model than patients in inpatient status, it was decided to build models only for patients in outpatient status. The work constructed a set of models for 2 clusters – a cluster of men in outpatient status (sample size 344 records) and a cluster of women in outpatient status (sample size 991 records). The analysis of the age distribution in each group showed a two-modal distribution with a boundary at a value of 40 years. Therefore, the groups were divided by age into two parts: up to 40 years and after. The lazypredict platform was used to select classical machine learning models. For each group, 4 methods were selected that gave the highest accuracy and models were built based on them, as well as ensembles of models - stacking and votinmg. The accuracy of the models based on the test data ranged from 4.1 to 6.3 years. In the Bayesian approach, a linear multifactorial regression model with a given a priori distribution of regression coefficients is constructed. The accuracy of the models ranged from 4.9 to 6.6 years.
1. Sidey-Gibbons J.A.M., Sidey-Gibbons Ch.J. Machine learning in medicine: a practical introduction. BMC Medical Research Methodology. 2019;19(1). https://doi.org/10.1186/s12874-019-0681-4
2. Gusev A.V., Gavrilov D.V., Korsakov I.N., Serova L.M., Novitsky R.E., Kuznetsova T.Yu. Prospects for the use of machine learning methods for predicting cardiovascular disease. Vrach i informatsionnye tekhnologii = Medical Doctor and Information Technologies. 2019;(3):41–47. (In Russ.).
3. Garri D.D., Saakyan S.V., Khoroshilova-Maslova I.P., Tsygankov A.Yu., Nikitin O.I., Tarasov G.Yu. Мethods of Machine Learning in Ophthalmology: Review. Oftal'mologiya = Ophthalmology in Russia. 2020;17(1):20–31. (In Russ.). https://doi.org/10.18008/1816-5095-2020-1-20-31
4. Sinotova S.L., Solodushkin S.I., Plaksina A.N., Makutina V.A. An intelligent clinical decision support system for predicting the outcome of an assisted reproductive technology protocol at various stages of its implementation. Modelirovanie, optimizatsiya i informatsionnye tekhnologii = Modeling, Optimization and Information Technology. 2022;10(2). (In Russ.). https://doi.org/10.26102/2310-6018/2022.37.2.009
5. Sinotova S.L., Limanovskaya O.V., Plaksina A.N., Makutina V.A. Software application for predicting the health status of a child born with the use of assisted reproductive technologies, according to the mothers anamnesis. Modelirovanie, optimizatsiya i informatsionnye tekhnologii = Modeling, Optimization and Information Technology. 2021;9(3). (In Russ.). https://doi.org/10.26102/2310-6018/2021.34.3.008
6. Gusev A.V., Novitskiy R.E., Ivshin A.A., Alekseev A.A. Machine learning based on laboratory data for disease prediction. FARMAKOEKONOMIKA. Sovremennaya farmakoekonomika i farmakoepidemiologiya = FARMAKOEKONOMIKA. Modern Pharmacoeconomics and Pharmacoepidemiology. 2021;14(4):581–592. (In Russ.). https://doi.org/10.17749/2070-4909/farmakoekonomika.2021.115
7. Zhmudyak M.L., Povalikhin A.N., Strebukov A.V., Zhmudyak A.L., Ustinov G.G. Avtomatizirovannaya sistema meditsinskoi diagnostiki zabolevanii s uchetom ikh dinamiki. Polzunovskii vestnik = Polzunovskiy vestnik. 2006;(1):95–106. (In Russ.).
8. Limanovskaya O.V., Meshchaninov V.N., Gavrilov I.V. Clustering of patients based on their functional, clinical and anthropometric indicators for the construction of models for assessing bio-age. Modelirovanie, optimizatsiya i informatsionnye tekhnologii = Modeling, Optimization and Information Technology. 2023;11(2). (In Russ.). https://doi.org/10.26102/2310-6018/2023.41.2.011
9. V'yugin V.V. Matematicheskie osnovy mashinnogo obucheniya i prognozirovaniya. Moscow: MTsMNO; 2013. 304 p. (In Russ.).
10. Kobzar' A.I. Prikladnaya matematicheskaya statistika. Moscow: FIZMATLIT; 2006. 816 p. (In Russ.).
11. Littlestone N., Warmuth M.K. The Weighted Majority Algorithm. Information and Computation. 1994;108(2):212–261. https://doi.org/10.1006/inco.1994.1009
Keywords: bayesian approach, random forest, ensembles of models, voiting, stacking, geroprophylactic effect, predicting the effectiveness of treatment, bio-growth
For citation: Limanovskaya O.V., Gavrilov I.V., Meshchaninov V.N., Lisovenko A.S. Building gender- and age-dependent models for assessing bio-age based on the functional data of the patient's body. Modeling, Optimization and Information Technology. 2024;12(2). URL: https://moitvivt.ru/ru/journal/pdf?id=1583 DOI: 10.26102/2310-6018/2024.45.2.012 (In Russ).
Received 24.05.2024
Revised 10.06.2024
Accepted 14.06.2024
Published 30.06.2024