Keywords: medical dataset, simulation modeling, queueing theory, digital twin, throughput, artificial intelligence
UDC 004.89
DOI: 10.26102/2310-6018/2026.55.4.020
Development of Artificial Intelligence technologies in medicine requires a systematic approach to collecting and processing structured datasets for training, testing, and validating machine learning models. This paper proposes a solution to this problem through simulation modeling based on queueing theory. This modeling requires estimating the planned throughput of each data collection point, ensuring a sufficient number of patients, the availability and reliability of their medical information, and meeting legal requirements regarding personal data protection and medical ethics. The proposed approach was studied using the analysis of biomedical data collection processes designed to train artificial intelligence models for remote diagnostic methods. The empirical part of the study was conducted at biomedical signal collection points over a six-month period. The total sample size was 574 patients. A simulation model was developed to optimize the data collection process. According to the simulation modeling, the average data collection intensity was 7.28 patients per day with significant variability in the workload. During the optimization process, changes were made to the data collection process through parallelization, which increased productivity by reducing the time spent on questionnaires and temperature measurements and increasing patient throughput. The optimization of the data collection process increased the workload from 4.67 to 12.12 patients per day. The proposed approach allows us to validate the architecture of the organizational and technological process for data collection before scaling and minimizes the risk of exceeding the schedule deadlines for generating medical datasets.
1. Reshetnikov R.V., Tyrov I.A., Vasilev Yu.A., et al. Assessing the quality of large generative models for basic healthcare applications. Medical Doctor and Information Technologies. 2025;(3):64–75. (In Russ.). https://doi.org/10.25881/18110193_2025_3_64
2. Vasilev Y.A., Bobrovskaya T.M., Arzamasov K.M., et al. Medical datasets for machine learning: fundamental principles of standartization and systematization. Manager Zdravoohranenia. 2023;(4):28–41. (In Russ.). https://doi.org/10.21045/1811-0185-2023-4-28-41
3. Sharova D.E., Mikhailova A.A., Gusev A.V., et al. An analysis of global experience in regulations on the use of medical data for artificial intelligence systems development based on machine learning. Medical Doctor and Information Technologies. 2022;(4):28–39. (In Russ.). https://doi.org/10.25881/18110193_2022_4_28
4. Arora A., Alderman J.E., Palmer J., et al. The value of standards for health datasets in artificial intelligence-based applications. Nature Medicine. 2023;29(11):2929–2938. https://doi.org/10.1038/s41591-023-02608-w
5. Schwabe D., Becker K., Seyferth M., Klaß A., Schaeffter T. The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review. npj Digital Medicine. 2024;7(1). https://doi.org/10.1038/s41746-024-01196-4
6. Kim J.-W., Kim Ch., Kim K.-H., et al. Scalable Infrastructure Supporting Reproducible Nationwide Healthcare Data Analysis toward FAIR Stewardship. Scientific Data. 2023;10(1). https://doi.org/10.1038/s41597-023-02580-7
7. Barseghyan N.V., Galimulina F.F. Digital modeling and optimization of economic systems: queuing theory and data analysis. Kursk: Universitetskaya kniga; 2025. 82 p. (In Russ.).
8. Slobodnyak I.A., Antipina P.V. Optimize the organization of the accounting service and other service functions using the theory of management of mass service systems. Ekonomika i upravlenie: problemy, resheniya. 2020;1(12):19–24. (In Russ.). https://doi.org/10.36871/ek.up.p.r.2020.12.01.004
9. Polukhin P.V. Application of queueing theory methods for estimating synchronization parameters of distributed computing systems. Modeling, Optimization and Information Technology. 2022;10(2). (In Russ.). https://doi.org/10.26102/2310-6018/2022.37.2.028
10. Tretyakova M.E., Smakuev A.J., Filatov V.V. Designing the process of providing services based on the methods of the theory of queuing. Applied economic research. 2022;(2):24–31. (In Russ.). https://doi.org/10.47576/2313-2086_2022_2_24
11. Touré V., Krauss Ph., Gnodtke K., et al. FAIRification of health-related data using semantic web technologies in the Swiss Personalized Health Network. Scientific Data. 2023;10. https://doi.org/10.1038/s41597-023-02028-y
12. Fun W.H., Tan E.H., Khalid R., et al. Applying Discrete Event Simulation to Reduce Patient Wait Times and Crowding: The Case of a Specialist Outpatient Clinic with Dual Practice System. Healthcare. 2022;10(2). https://doi.org/10.3390/healthcare10020189
13. Vecillas Martin D., Berruezo Fernández Ch., Gento Municio A.M. Systematic Review of Discrete Event Simulation in Healthcare and Statistics Distributions. Applied Sciences. 2025;15(4). https://doi.org/10.3390/app15041861
14. Di Pumpo M., Ianni A., Miccoli G.A., et al. Queueing Theory and COVID-19 Prevention: Model Proposal to Maximize Safety and Performance of Vaccination Sites. Frontiers in Public Health. 2022;10. https://doi.org/10.3389/fpubh.2022.840677
15. Kuruppu Appuhamilage G.D.K., Hussain M., Zaman M., Khan W.A. A health digital twin framework for discrete event simulation based optimised critical care workflows. npj Digital Medicine. 2025;8(1). https://doi.org/10.1038/s41746-025-01738-4
16. Declerck J., Kalra D., Vander Stichele R., Coorevits P. Frameworks, Dimensions, Definitions of Aspects, and Assessment Methods for the Appraisal of Quality of Health Data for Secondary Use: Comprehensive Overview of Reviews. JMIR Medical Informatics. 2024;12. https://doi.org/10.2196/51560
Keywords: medical dataset, simulation modeling, queueing theory, digital twin, throughput, artificial intelligence
For citation: Ivaschenko A.V., Terekhin M.A., Poretskova G.Y., Zhdanovich G.E., Melnikov D.A., Radaev D.E. Modeling and optimization of data collection process for artificial intelligence in medicine. Modeling, Optimization and Information Technology. 2026;14(4). URL: https://moitvivt.ru/ru/journal/article?id=2232 DOI: 10.26102/2310-6018/2026.55.4.020 (In Russ).
© Ivaschenko A.V., Terekhin M.A., Poretskova G.Y., Zhdanovich G.E., Melnikov D.A., Radaev D.E. Статья опубликована на условиях лицензии Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NS 4.0)Received 16.02.2026
Revised 14.04.2026
Accepted 21.04.2026
Published 30.04.2026