Dysarthria speech recognition by phonemes using hidden Markov models

idBredikhin B.A. idAntor M. Khlebnikov N.A. Melnikov A.V. Bachurin M.V.

UDC 004.852
DOI: 10.26102/2310-6018/2024.44.1.002

Abstract
List of references
About authors

The relevance of the paper is due to the difficulties of oral interaction between people with speech disorders and normotypic interlocutors as well as the low quality of abnormal speech recognition by standard speech recognition systems and the inability to create a system capable of processing any speech disorders. In this regard, this article is aimed at developing a method for automatic recognition of dysarthric speech using a pre-trained neural network for recognizing phonemes and hidden Markov models for converting phonemes into text and subsequent correction of recognition results using a search in the space of acceptable words of the nearest Levenshtein word and a dynamic algorithm for splitting the output of the model into separate words. The main advantage of using hidden Markov models in comparison with neural networks is the small size of the training data set collected individually for each user, as well as the ease of training the model further in case of progressive speech disorders. The data set for model training is described, and recommendations for collecting and marking data for model training are given. The effectiveness of the proposed method is tested on an individual data set recorded by a person with dysarthria; the recognition quality is compared with neural network models trained on the data set used. The materials of the article are of practical value for creating an augmented communication system for people with speech disorders.

1. Rowe H.P., Gutz S.E., Maffei M.F., Tomanek K., Green J.R. Characterizing dysarthria diversity for automatic speech recognition: a tutorial from the clinical perspective. Front. Comput. Sci. 4:770210. DOI: 10.3389/fcomp.2022.770210.

2. Balaji V., Sadashivappa G. Speech disabilities in adults and the suitable speech recognition software tools – a review. In: 2015 International Conference on Computing and Network Communications (CoCoNet), Trivandrum, India, 2015. p. 559–564. DOI: 10.1109/CoCoNet.2015.7411243.

3. Xiong F., Barker J., Christensen H. Deep learning of articulatory-based representations and applications for improving dysarthric speech recognition. Speech Communication; 13th ITG-Symposium, Oldenburg, Germany, 2018. p. 1–5.

4. Xiong F., Barker J., Christensen H. Phonetic analysis of dysarthric speech tempo and applications to robust personalised dysarthric speech recognition. ICASSP 2019 – 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019. p. 5836–5840. DOI: 10.1109/ICASSP.2019.8683091.

5. Hawley M.S., Cunningham S.P., Green P.D., Enderby P., Palmer R., Sehgal S., et al. A voice-input voice-output communication aid for people with severe speech impairment. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2013;21(1):23–31.

6. Yeo E.J., Choi K., Kim S., Chung M. Automatic severity classification of dysarthric speech by using self-supervised model with multi-task learning. In: ICASSP 2023 – 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023. p. 1–5. DOI: 10.1109/ICASSP49357.2023.10094605.

7. Hashan A.M., Bredikhin B. Russian Voice Dataset. Kaggle. URL: https://www.kaggle.com/dsv/5954738 (accessed on 12.08.2023).

8. Xu Q., Baevski A., Auli M. Simple and effective zero-shot cross-lingual phoneme recognition. arXiv; 2021. URL: http://arxiv.org/abs/2109.11680 (accessed on 18.05.2023).

9. Levenshtein, V., Binary codes capable of correcting deletions, insertions and reversals. Doklady AN USSR. 1965;163(4):845–848. (In Russ.).

10. Baevski A., Zhou Y., Mohamed A., Auli M. wav2vec 2.0: A framework for self-supervised learning of speech representations. In: Advances in Neural Information Processing Systems. Curran Associates, Inc.; 2020. p. 12449–12460. DOI: 10.48550/arXiv.2006.11477.

Bredikhin Boris Andreevich

ORCID | eLibrary |

Ural Federal University
CyberLympha

Yekaterinburg, the Russian Federation

Antor Mahamudul

ORCID |

Ural Federal University

Yekaterinburg, the Russian Federation

Khlebnikov Nikolay Aleksandrovich
Candidate of Chemical Sciences

Ural Federal University

Yekaterinburg, the Russian Federation

Melnikov Aleksandr Valerievich

Ural Federal University

Yekaterinburg, thet Russian Federation

Bachurin Matvey Vladimirovich

Ural Federal University

Yekaterinburg, the Russian Federation

Keywords: hidden Markov models, dysarthria, automatic speech recognition, phonemes recognition, phoneme correction

For citation: Bredikhin B.A. Antor M. Khlebnikov N.A. Melnikov A.V. Bachurin M.V. Dysarthria speech recognition by phonemes using hidden Markov models. Modeling, Optimization and Information Technology. 2024;12(1). Available from: https://moitvivt.ru/ru/journal/pdf?id=1471 DOI: 10.26102/2310-6018/2024.44.1.002 .

334

Full text in PDF

Received 02.11.2023

Revised 04.12.2023

Accepted 17.01.2024

Published 31.03.2024