Информационно-теоретическая метрика для автоматического лексикографического отбора в бенгальском жестовом языке
Работая с сайтом, я даю свое согласие на использование файлов cookie. Это необходимо для нормального функционирования сайта, показа целевой рекламы и анализа трафика. Статистика использования сайта обрабатывается системой Яндекс.Метрика
Научный журнал Моделирование, оптимизация и информационные технологииThe scientific journal Modeling, Optimization and Information Technology
Online media
issn 2310-6018

An information-theoretic metric for automated lexicographic selection in Bengali Sign Language

idAshrafi A., idMokhnachev V.S.

UDC 81'33:004.9
DOI: 10.26102/2310-6018/2026.57.6.011

  • Abstract
  • List of references
  • About authors

Creating solutions to help the hearing impaired individuals who use Bengali Sign Language, which is considered a low-resource language, is a challenge due to a lack of resources and expert availability. This paper introduces a novel information-theoretic metric, the Information Value for Sign Lexicography (IV-SL), designed to automate the lexicographic selection process for sign language dictionary development. The proposed framework uses a Python-based implementation, which incorporates MediaPipe Holistic for the extraction of visual-kinematic features, including handshapes, movement trajectory, and facial expressions, as well as Word2Vec for the semantic relationships between the gloss word embeddings of the Bengali language. An iterative selection mechanism prioritizes signs based on maximum information gain per dictionary entry, balancing rarity and diversity to minimize redundancy while ensuring broad lexical coverage. Experimental validation demonstrates that the IV-SL metric produces prioritized lexicons with strong alignment to expert linguist judgments, significantly outperforming frequency-based baselines. Initial validation was conducted on a synthetic dataset (880 samples) with simulated phonological features. Confirmation on real-world Bengali Sign Language video data remains a subject for future research. The scientific novelty of this research lies in the principled application of informativeness and diversity criteria – concepts drawn from active learning theory – to sign language lexicography, offering a scalable, reproducible solution for under-resourced sign languages.

1. Grimm N. Documentary Approaches to Lexicography. In: Current Issues in Descriptive Linguistics and Digital Humanities. Singapore: Springer; 2022. P. 551–567. https://doi.org/10.1007/978-981-19-2932-8_37

2. Ashrafi A., Mokhnachev V.S., Harlamenkov A.E. Improving Sign Language Recognition with Machine Learning and Artificial Intelligence. In: 2024 6th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE), 29 February – 02 March 2024, Moscow, Russia. IEEE; 2024. https://doi.org/10.1109/REEPE60449.2024.10479844

3. Ashrafi A., Mokhnachev V., Philippovich Y., et al. Russian Sign Language Recognition Using MediaPipe. In: Artificial Intelligence in Models, Methods and Applications: Artificial Intelligence in Engineering and Science (AIES-2022), 15–18 November 2022, Virtual Event. Cham: Springer; 2023. P. 299–313. https://doi.org/10.1007/978-3-031-22938-1_21

4. Ayyadevara V.K. Word2vec. In: Pro Machine Learning Algorithms. Berkeley: Apress; 2018. P. 167–178. https://doi.org/10.1007/978-1-4842-3564-5_8

5. Bochkarev V., Solovyev V., Shevlyakova A. A Corpus-Based Study of the Rate of Changes in Frequency of Syntactic Bigrams in English and Russian. In: Advances in Soft Computing: 18th Mexican International Conference on Artificial Intelligence (MICAI 2019), 27 October – 02 November 2019, Xalapa, Mexico. Cham: Springer; 2019. P. 463–474. https://doi.org/10.1007/978-3-030-33749-0_37

6. Moosavi M.S., Raimbaud P., Guillet Ch., et al. Enhancing weight perception in virtual reality: an analysis of kinematic features. Virtual Reality. 2024;28(2):72. https://doi.org/10.1007/s10055-024-00948-7

7. Fox N., Woll B., Cormier K. Best practices for sign language technology research. Universal Access in the Information Society. 2023;24(1):69–77. https://doi.org/10.1007/s10209-023-01039-1

8. Almeida A.M.P., Condeço T., Ramos F., et al. Signs Workshop: The Importance of Natural Gestures in the Promotion of Early Communication Skills of Children with Developmental Disabilities. In: Gesture-Based Human-Computer Interaction and Simulation: 7th International Gesture Workshop (GW 2007), 23–25 May 2007, Lisbon, Portugal. Berlin, Heidelberg: Springer; 2009. P. 245–254. https://doi.org/10.1007/978-3-540-92865-2_27

9. Napier J., Leeson L. Learning and Teaching Sign Languages. In: Sign Language in Action. London: Palgrave Macmillan; 2016. P. 87–118. https://doi.org/10.1057/9781137309778_4

10. Rojas H., Alvarez C., Rojas N. Statistical Hypothesis Testing for Information Value (IV). Journal of Statistical Theory and Applications. 2025;24(4):1196–1216. https://doi.org/10.1007/s44199-025-00144-9

11. Yazdani Sh., Hamidullah Y., España-Bonet C., et al. A Critical Study of Automatic Evaluation in Sign Language Translation. arXiv. URL: https://arxiv.org/abs/2510.25434 [Accessed 10th February 2026].

12. Ashrafi A., Mokhnachev V.S., Philippovich Y.N., et al. Development of Image Dataset Using Hand Gesture Recognition System for Progression of Sign Language Translator. In: Software Engineering Perspectives in Intelligent Systems: Proceedings of 4th Computational Methods in Systems and Software (CoMeSySo 2020), 14–17 October 2020, Virtual Event. Cham: Springer; 2020. P. 665–675. https://doi.org/10.1007/978-3-030-63322-6_56

13. Honkamaa J., Marttinen P. New Multimodal Similarity Measure for Image Registration via Modeling Local Functional Dependence with Linear Combination of Learned Basis Functions. In: Medical Image Computing and Computer Assisted Intervention: 28th International Conference (MICCAI 2025): Proceedings: Part II, 23–27 September 2025, Daejeon, South Korea. Cham: Springer; 2026. P. 399–408. https://doi.org/10.1007/978-3-032-04937-7_38

14. Aashik S., Ch S., Ghali V.S., et al. Logarithmic Frequency Modulated Thermal Wave Imaging for Subsurface Analysis. Russian Journal of Nondestructive Testing. 2024;60(8):898–911. https://doi.org/10.1134/S1061830924602149

Ashrafi Arifa

Scopus | ORCID | eLibrary |

Mocow Polytechnic University

Moscow, Russian Federation

Mokhnachev Viktor Sergeevich

Scopus | ORCID | eLibrary |

Mocow Polytechnic University

Moscow, Russian Federation

Keywords: sign language lexicography, low-resource languages, bengali Sign Language (BdSL), information value, corpus linguistics, mediaPipe

For citation: Ashrafi A., Mokhnachev V.S. An information-theoretic metric for automated lexicographic selection in Bengali Sign Language. Modeling, Optimization and Information Technology. 2026;14(6). URL: https://moitvivt.ru/ru/journal/article?id=2298 DOI: 10.26102/2310-6018/2026.57.6.011 .

© Ashrafi A., Mokhnachev V.S. Статья опубликована на условиях лицензии Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NS 4.0)
4

Full text in PDF

Скачать JATS XML

Received 02.04.2026

Revised 11.06.2026

Accepted 19.06.2026