Keywords: machine learning, chronic hepatitis C, HIV coinfection, binary classifiers, lasso regression, sum of squared errors (MSE), regularization, decision Tree Classifier, ROC curve, area Under Curve (AUC)
Estimation of the risk of developing chronic hepatitis C based on heuristic classification algorithms
UDC 512+514.743.2(07)
DOI: 10.26102/2310-6018/2024.46.3.020
The materials of the article are intended for specialists in the field of machine learning for the organization of technologies for improving the quality of information perception and interpretation of measurements in the practice of making medical decisions. The article proposes a method for selecting, tuning and testing a classifier under conditions of a deficit of a priori information in the data used. This is relevant when small samples of measurements of biological objects and their systems are formed at the initial stage of scientific research, the nonlinear properties of which often lead to the failure of statistical criteria. Nevertheless, the consistency of "inconvenient" distributions should be expressed in an adequate response of the program for assisting a medical decision. Based on this, the goal is determined - the choice of a solution method from the proposed set of methods for machine tuning of feature separation. Most tuning algorithms are heuristic, where the stop of parametric estimation is based on the criteria of entropy minimization as an indirect maximization of the received information. The main task is to determine the algorithm for learning and tuning the classification regression with an explicit predictive behavior of the similarity of the statistical convergence of the minimized errors. This guarantees an increase in the quality of risk classification with a trivial increase in training instances. The peculiarity of the case under consideration lies in the duality of the nature of chronic hepatitis C (CHC) progression in children: with HIV coinfection and CHC itself. This raises the problem of finding unified conditions for metric minimization of errors in еstimation the risk of developing CHC based on machine learning methods. Data sets were studied on small samples in order to identify significant parameters for heuristic identification of the presence of risks of developing the main and concomitant diseases. In this study, several methods of shallow machine learning of linear regressions were used, mainly using heuristic solutions for probabilistic separation of features. The article selectively describes the use of some basic learning methods taking into account their features in the technological verification of risk classifiers.
1. Hasan N., Bao Yu. Understanding current states of machine learning approaches in medical informatics: a systematic literature review. Health and Technology. 2021;11(3):471–482. https://doi.org/10.1007/s12553-021-00538-6
2. Majzoobi M.M., Namdar S., Najafi-Vosough R., Hajilooi A.A., Mahjub H. Prediction of Hepatitis disease using ensemble learning methods. Journal of Preventive Medicine and Hygiene. 2022;63(3):E424–E428. https://doi.org/10.15167/2421-4248/jpmh2022.63.3.2515
3. Moulaei K., Sharifi H., Bahaadinbeigy K., Haghdoost A.A., Nasiri N. Machine learning for prediction of viral hepatitis: A systematic review and meta-analysis. International Journal of Medical Informatics. 2023;179. https://doi.org/10.1016/j.ijmedinf.2023.105243
4. Samokhodskaya L.M., Starostina E.E., Sulimov A.V., Krasnova T.N., Rosina T.P., Avdeev V.G., Savkin I.A., Sulimov V.B., Mukhin N.A., Tkachuk V.A., Sadovnichii V.A. Prediction of features of the course of chronic hepatitis C using Bayesian networks. Terapevticheskii arkhiv = Therapeutic Archive. 2019;91(2):32–39. (In Russ.). https://doi.org/10.26442/00403660.2019.02.000076
5. Kashif A.A., Bakhtawar B., Akhtar A., Akhtar S., Aziz N., Javeid M.S. Treatment Response Prediction in Hepatitis C Patients using Machine Learning Techniques. International Journal of Technology, Innovation and Management (IJTIM). 2021;1(2):79–89. https://doi.org/10.54489/ijtim.v1i2.24
6. Rosen H.R. Chronic Hepatitis C Infection. New England Journal of Medicine. 2011;364(25):2429–2438. https://doi.org/10.1056/NEJMcp1006613
7. Crisan D., Radu C., Grigorescu M.D., Lupsor M., Feier D., Grigorescu M. Prospective Non-Invasive Follow-up of Liver Fibrosis in Patients with Chronic hepatitis C. Journal of Gastrointestinal and Liver Diseases. 2012;21(4):375–382.
8. Zhang D., Liu X., Shao M., Sun Y., Lian Q., Zhang H. The value of artificial intelligence and imaging diagnosis in the fight against COVID-19. Personal and Ubiquitous Computing. 2023;27(3):783–792. https://doi.org/10.1007/s00779-021-01522-7
9. Pieczkiewicz D.S., Finkelstein S.M. Evaluating the decision accuracy and speed of clinical data visualizations. Journal of the American Medical Informatics Association. 2010;17(2):178–181. https://doi.org/10.1136/jamia.2009.001651
10. Alizargar A., Chang Y.-L., Tan T.-H. Performance Comparison of Machine Learning Approaches on Hepatitis C Prediction Employing Data Mining Techniques. Bioengineering. 2023;10(4). https://doi.org/10.3390/bioengineering10040481
11. Harabor V., Mogos R., Nechita A., Adam A.-M., Adam G., Melinte-Popescu A.-S., Melinte-Popescu M., Stuparu-Cretu M., Vasilache I.-A., Mihalceanu E., Carauleanu A., Bivoleanu A., Harabor A. Machine Learning Approaches for the Prediction of Hepatitis B and C Seropositivity. International Journal of Environmental Research and Public Health. 2023;20(3). https://doi.org/10.3390/ijerph20032380
12. Krittanawong C., Virk H.U.H., Bangalore S., Wang Z., Johnson K.W., Pinotti R., Zhang H., Kaplin S., Narasimhan B., Kitai T., Baber U., Halperin J.L., Tang W.H.W. Machine learning prediction in cardiovascular diseases: a meta-analysis. Scientific Reports. 2020;10. https://doi.org/10.1038/s41598-020-72685-1
13. Konerman M.A., Beste L.A., Van T., Liu B., Zhang X., Zhu J., Saini S.D., Su G.L., Nallamothu B.K., Ioannou G.N., Waljee A.K. Machine learning models to predict disease progression among veterans with hepatitis C virus. PLoS One. 2019;14(1). https://doi.org/10.1371/journal.pone.0208141
14. Roslina A.H., Noraziah A. Prediction of hepatitis prognosis using Support Vector Machines and Wrapper Method. In: 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery, 10–12 August 2010, Yantai, China. IEEE; 2010. pp. 2209–2211. https://doi.org/10.1109/FSKD.2010.5569542
15. Hossen M.S., Haque I., Sarkar P.R., Islam M.A., Fahim W.A., Khatun T. Examining The Risk Factors of Liver Disease: A Machine Learning Approach. In: 2022 7th International Conference on Communication and Electronics Systems (ICCES), 22–24 June 2022, Coimbatore, India. IEEE; 2022. pp. 1249–1257. https://doi.org/10.1109/ICCES54183.2022.9835732
16. KayvanJoo A.H., Ebrahimi M., Haqshenas G. Prediction of hepatitis C virus interferon/ribavirin therapy outcome based on viral nucleotide attributes using machine learning algorithms. BMC Research Notes. 2014;7(1). https://doi.org/10.1186/1756-0500-7-565
17. Park H., Lo-Ciganic W.-H., Huang J., Wu Y., Henry L., Peter J., Sulkowski M., Nelson D.R. Evaluation of machine learning algorithms for predicting direct-acting antiviral treatment failure among patients with chronic hepatitis C infection. Scientific Reports. 2022;12(1). https://doi.org/10.1038/s41598-022-22819-4
18. Chen L., Ji P., Ma Y. Machine Learning Model for Hepatitis C Diagnosis Customized to Each Patient. IEEE Access. 2022;10:106655–106672. https://doi.org/10.1109/ACCESS.2022.3210347
19. Singh K.R., Gupta R., Kadian R.K., Singh R. An Optimized XGBoost approach for Predicting Progression of Hepatitis C using Hyperparameter Tuning and Feature Interaction Constraint. In: 2022 2nd Asian Conference on Innovation in Technology (ASIANCON), 26–28 August 2022, Ravet, India. IEEE; 2022. pp. 1–8. https://doi.org/10.1109/ASIANCON55314.2022.9909086
20. Singh U., Gourisaria M.K., Mishra B.K. A Dual Dataset approach for the diagnosis of Hepatitis C Virus using Machine Learning. In: 2022 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), 08–10 July 2022, Bangalore, India. IEEE; 2022. pp. 1–6. https://doi.org/10.1109/CONECCT55679.2022.9865758
21. Farooq S.A. The Multi-Class Detection of Five Stages of Hepatitis C Using the Machine Learning Based Random Forest Algorithm. In: 2023 World Conference on Communication & Computing (WCONF), 14–16 July 2023, Raipur, India. IEEE; 2023. pp. 1–6. https://doi.org/10.1109/WCONF58270.2023.10235157
22. Lilhore U.K., Manoharan P., Sandhu J.K., Simaiya S., Dalal S., Baqasah A.M., Alsafyani M., Alroobaea R., Keshta I., Raahemifar K. Hybrid model for precise hepatitis-C classification using improved random forest and SVM method. Scientific Reports. 2023;13(1). https://doi.org/10.1038/s41598-023-36605-3
23. Ali A.M., Hassan M.R., Aburub F., Alauthman M., Aldweesh A., Al-Qerem A., Jebreen I., Nabot A. Explainable Machine Learning Approach for Hepatitis C Diagnosis Using SFS Feature Selection. Machines. 2023;11(3). https://doi.org/10.3390/machines11030391
24. Ara A., Sami A., Michael D.L., Bazgir E., Mandal P. Hepatitis C prediction using SVM, logistic regression and decision tree. World Journal of Advanced Research and Reviews. 2024;22(02):926–936. https://doi.org/10.30574/wjarr.2024.22.2.1483
25. Mahmud M., Budiman I., Indriani F., Kartini D., Faisal M.R., Rozaq H.A.A., Yildiz O., Caesarendra W. Implementation of C5.0 Algorithm using Chi-Square Feature Selection for Early Detection of Hepatitis C Disease. Journal of Electronics, Electromedical Engineering, and Medical Informatics. 2024;6(2):116–124. https://doi.org/10.35882/jeeemi.v6i2.384
26. Yefou U.N., Choudja P.O.M., Sow B., Adejumo A. Optimized Machine Learning Models for Hepatitis C Prediction: Leveraging Optuna for Hyperparameter Tuning and Streamlit for Model Deployment. In: Pan-African Conference on Artificial Intelligence: Part I, 5–6 October 2023, Addis Ababa, Ethiopia. Cham: Springer; 2023. pp. 88–100. https://doi.org/10.1007/978-3-031-57624-9_5
27. Zhang L., Wang J., Chang R., Wang W. Investigation of the effectiveness of a classification method based on improved DAE feature extraction for hepatitis C prediction. Scientific Reports. 2024;14(1). https://doi.org/10.1038/s41598-024-59785-y
28. Bakulin I.G., Dianova N.Kh., Sandler Yu.G., Prostov M.Yu. Mathematical models predicting leukopenia and neutropenia in patients with chronic hepatitis C in the background interferon-containing schemes. Arkhiv" vnutrennei meditsiny = The Russian Archives of Internal Medicine. 2016;6(5):53–62. (In Russ.). https://doi.org/10.20514/2226-6704-2016-6-5-53-62
29. Astafev A.N. Method of differential diagnostics of the nosological form of viral hepatitis with the application of neural network of cascade correlation. Modelirovanie, optimizatsiya i informatsionnye tekhnologii = Modeling, Optimization and Information Technology. 2019;7(3). (In Russ.). https://doi.org/10.26102/2310-6018/2019.26.3.028
30. Teryaeva M.A., Borisova O.V., Palevskaya S.A., Gushchin A.V.; applicant Federal State Budgetary Educational Institution of Higher Education "Samara State Medical University" of the Ministry of Health of the Russian Federation. Program for assessing the risk of progression of chronic hepatitis C in children coinfected with HIV № 2023668604: publ. 30.08.2023. The Certificate on Official Registration of the Computer Program № 2023616384 the Russian Federation. This product is registered in the registry of the computer programs. (In Russ.).
Keywords: machine learning, chronic hepatitis C, HIV coinfection, binary classifiers, lasso regression, sum of squared errors (MSE), regularization, decision Tree Classifier, ROC curve, area Under Curve (AUC)
For citation: Palevskaya S.A., Gushchin A.V., Ivanov D.V. Estimation of the risk of developing chronic hepatitis C based on heuristic classification algorithms. Modeling, Optimization and Information Technology. 2024;12(3). URL: https://moitvivt.ru/ru/journal/pdf?id=1623 DOI: 10.26102/2310-6018/2024.46.3.020 (In Russ).
Received 04.07.2024
Revised 23.08.2024
Accepted 03.09.2024
Published 30.09.2024