Keywords: regression problem, feature selection, finding and removing anomalies, machine learning, biological age
Modeling the biological age of the patients based on their functional indicators
UDC 51-76
DOI: 10.26102/2310-6018/2021.33.2.028
The aging process is a complex multifactorial phenomenon, which is influenced by both external factors – climatic, economic and political conditions, and individual characteristics of the body. In this regard, modeling this process is a non-trivial task that requires a versatile approach to solve it. An analysis of the literature shows that when modeling the rate of aging, both conceptual [1-4] models are used, which give an idea of how to assess the aging process in principle, and more specific computational models [5-9], which make it possible to predict the rate of aging. When constructing computational models, there is a contradiction between the completeness of the model and the possibility of using it for forecasting. Thus, models that show all the relationships in the aging process well [7], which are usually constructed on graphs, are difficult to apply to the numerical estimation of the aging rate, although some of them make it possible to construct individual aging trajectories [8-9]. At the same time, models that have a strong numerical apparatus for estimating the rate of aging [5-6], as a rule, are sharpened to solve a narrow problem and do not cover the entire complexity of the aging process. In such a situation, the use of machine learning methods in computational models for estimating the rate of aging is a very promising direction [10-15], since its application allows us to take into account all the variety of factors of the aging process, without delving into the essence of the process itself. In this paper, machine learning methods are used to analyze the correlation of functional indicators of patients with their calendar age and to build models for predicting the biological age of patients. The data analysis was carried out with the help of the author's developments in the Python language in the Anaconda environment. For the analysis, we used functional indicators (10 pieces) of 1185 patients from the database of the clinical regional psycho-neurological hospital of war veterans in the number. The analysis of the data showed the presence of a statically significant correlation of the indicators used with the calendar age of the patients. In this paper, 5 regression models were constructed using various tools of the Python sklern library (batch gradient descent, stochastic gradient descent, ridge regression, ridge regression with Bayesian selection, the support vector machine method), and algorithm compositions from decision trees (random forest and boosting) were used. To improve the quality of the model, we used feature selection (add-dell) and outlier search and removal using the reference vector method, the isolating forest method, and the nearest neighbor method. All the models obtained are adequate (verification by the Fisher criterion), but the most accurate (R2 = 0.75) was shown by the model of the composition of a random forest on the full set of features after the removal of anomalies by the support vector machine. The results of modeling using linear models showed that the highest weights in the model have 3 functional indicators – accommodation, vital capacity of the lungs and hearing acuity.
1. L´opez-Ot´ın C., Blasco M.A., Partridge L., Serrano M., Kroemer G. The hallmarks of aging. Cell 2013;153:1194–1217. DOI: 10.1016/j.cell.2013.05.039
2. Kennedy B.K., Berger S.L., Brunet A., Campisi J., Cuervo A.M., Epel E.S., Franceschi C., Lithgow G.J., Morimoto R.I., Pessin J.E., Rando T.A., Richardson A., Schadt E.E., Wyss-Coray T., Sierra F. Geroscience: Linking Aging to Chronic Disease. Cell. 2014;159(4):709–713. DOI: 10.1016/j.cell.2014.10.039
3. Kirkwood T.B.L. Understanding the odd science of aging. Cell. 2005;120:437 – 447. DOI: 10.1016/j.cell.2005.01.027
4. Kirkwood T.B.L. Deciphering death: a commentary on Gompertz (1825) ‘On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies’. Philosophical Transactions Of The Royal Society Of London Series B. 2015;370(1666):20140379–2014037. DOI: 10.1098/rstb.2014.0379
5. Yashin A.I., Arbeev K.G., Akushevich I., Kulminski A., Akushevich L., Ukraintseva S.V. Stochastic model for analysis of longitudinal data on aging and mortality. Mathematical Biosciences. 2007;208:538–551. DOI: 10.1016/j.mbs.2006.11.006
6. Yashin A.I., Arbeev K.G., Akushevich I., Kulminski A., Ukraintseva S.V., Stallard E., Land K.C. The quadratic hazard model for analyzing longitudinal data on aging, health, and the life span. Physics of Life Reviews. 2012;9:177–188. DOI: 10.1016/j.plrev.2012.05.002
7. Taneja S., Mitnitski A.B., Rockwood K., Rutenberg A.D. Dynamical network model for age-related health deficits and mortality. Physical Review E 2016:93(2):022309–022311. DOI: 10.1103/PhysRevE.93.022309
8. Farrell S.G., Mitnitski A.B., Rockwood K., Rutenberg A.D. Network model of human aging: Frailty limits and information measures. Physical Review E 2016:94(5):052409-052419. DOI: 10.1103/PhysRevE.94.052409
9. Farrell S, Mitnitski A, Rockwood K, Rutenberg A. Generating synthetic aging trajectories with a weighted network model using cross-sectional data. Scientific Reports. 2020:10(1):19833-19844. DOI: 10.1038/s41598-020-76827-3
10. Pierson E., Koh P.W., Hashimoto T., Koller D., Liang P. Inferring multidimensional rates of aging from cross-sectional data. Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) 2019:89:97–107
11. Putin E., Mamoshina P., Aliper A., Korzinkin M., Moskalev A., Kolosov A., Ostrovskiy A., Cantor C. Vijg J., Zhavoronkov A. Deep biomarkers of human aging: Application of deep neural networks to biomarker development. Aging (Albany NY). 2016;8(5):1021-1033. DOI: 10.18632/aging.100968
12. Zhavoronkov A., Mamoshina P. Deep Aging Clocks: The Emergence of AI-Based Biomarkers of Aging and Longevity. Trends Pharmacol Sci. 2019;40(8):546-549. DOI: 10.1016/j.tips.2019.05.004
13. Levine ME. Assessment of Epigenetic Clocks as Biomarkers of Aging in Basic and Population Research. J Gerontol A Biol Sci Med Sci. 2020;75(3):463-465. DOI: 10.1093/gerona/glaa021
14. Pyrkov T.V., Getmantsev E., Zhurov B., Avchaciov K., Pyatnitskiy M., Men'shikov, L., Khodova K., Gudkov A., Fedichev P. Quantitative characterization of biological age and frailty based on locomotor activity records. Aging (Albany NY). 2019;10:2973 - 2990. DOI: 10.1038/s41598-018-23534-9
15. Schultz M.B., Kane A.E., Mitchell S.J., MacArthur M.R., Warner E., Vogel D.S., Mitchell J.R., Howlett S.E., Bonkowski M.S., Sinclair D.A. Age and life expectancy clocks based on machine learning analysis of mouse frailty. Nature Communications. 2020;11(1):4618-4628. DOI: 10.1038/s41467-020-18446-0
16. Farrell S., Stubbings G., Rockwood K., Mitnitski A., Rutenberg A. The potential for complex computational models of aging. Mechanisms of Ageing and Development. 2020;193:111403-111418. DOI: 10.1016/j.mad.2020.111403
17. Zhavoronkov A., Mamoshina P., Vanhaelen Q., Scheibye-Knudsene M., Moskalev A., Alipera A. Artificial intelligence for aging and longevity research: Recent advances and perspectives. Ageing Research Reviews. 2019;49:49-66. DOI: 10.1016/j.arr.2018.11.003
18. Fedintsev A., Daria Kashtanova D., Tkacheva O., Strazhesko I., Kudryavtseva A., Baranova A., Moskalev A. Markers of arterial health could serve as accurate non‐invasive predictors of human biological and chronological age. Aging. 2017;9:1-13. DOI: 10.18632/aging.101227
19. Cohen A.A., Morissette-Thomas V., Ferrucci L., Fried L.P. Deep biomarkers of aging are population-dependent. Aging (Albany NY). 2016;8(9):2253-2255. DOI: 10.18632/aging
20. Gromyko G.L. Teoriya statistiki. M .: INFRA-M, 2002
21. Aggarwal C.C. Data Mining: The Textbook. New York: Springer, 2015
22. Vorontsov K. V. Lektsii po metodu opornykh vektorov. Available at: http://www.ccas.ru/voron/download/SVM.pdf (accessed 12.03.2021) (In Russ)
23. Limanovskaya O.V., Alferieva T.I. Osnovy mashinnogo obucheniya: uchebnoye posobiye. Yekaterinburg: Izdatel'stvo Ural'skogo universiteta, 2020
24. Guyon I, Elisseeff A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003;3: 1157–1182.
25. Liu F. T., Ting K. M., Zhou Z. Isolation Forest. Eighth IEEE International Conference on Data Mining, 2008; 413-422. DOI: 10.1109/ICDM.2008.17
26. Anaconda - solutions for Data Science Practitioners and Enterprise Machine Learning. Available at: https://www.anaconda.com (accessed 18.02.2021)
27. SciPy library. Available at: https://www.scipy.org/index.html (accessed 18.02.2021)
28. Faris H., Mafarja M.M., Heidari A.A., Aljarah I., Al-Zoubi A.M., Mirjalili S., Fujita H. An efficient binary Salp Swarm Algorithm with crossover scheme for feature selection problems. Knowledge-Based Systems. 2018;154:43–67. DOI: 10.1016/j.knosys.2018.05.009
29. XGBoost library. Available at: https://xgboost.ai/ (accessed 17.02.2021)
30. NumPy library. Available at: https://numpy.org/ (accessed 18.02.2021)
31. Pandas library. Available at: https://pandas.pydata.org/ (accessed 18.02.2021)
32. Matplotlib library. Available at: https://matplotlib.org/index.html (accessed 18.02.2021)
Keywords: regression problem, feature selection, finding and removing anomalies, machine learning, biological age
For citation: Limanovskaya O.V., Gavrilov I.V., Meshchaninov V.N., Shcherbakov D.L., Kolos E.N. Modeling the biological age of the patients based on their functional indicators. Modeling, Optimization and Information Technology. 2021;9(2). URL: https://moitvivt.ru/ru/journal/pdf?id=966 DOI: 10.26102/2310-6018/2021.33.2.028 (In Russ).
Received 02.08.2021
Revised 03.08.2021
Accepted 11.08.2021
Published 30.06.2021