Keywords: sign language lexicography, low-resource languages, bengali Sign Language (BdSL), information value, corpus linguistics, mediaPipe
UDC 81'33:004.9
DOI: 10.26102/2310-6018/2026.57.6.011
Creating solutions to help the hearing impaired individuals who use Bengali Sign Language, which is considered a low-resource language, is a challenge due to a lack of resources and expert availability. This paper introduces a novel information-theoretic metric, the Information Value for Sign Lexicography (IV-SL), designed to automate the lexicographic selection process for sign language dictionary development. The proposed framework uses a Python-based implementation, which incorporates MediaPipe Holistic for the extraction of visual-kinematic features, including handshapes, movement trajectory, and facial expressions, as well as Word2Vec for the semantic relationships between the gloss word embeddings of the Bengali language. An iterative selection mechanism prioritizes signs based on maximum information gain per dictionary entry, balancing rarity and diversity to minimize redundancy while ensuring broad lexical coverage. Experimental validation demonstrates that the IV-SL metric produces prioritized lexicons with strong alignment to expert linguist judgments, significantly outperforming frequency-based baselines. Initial validation was conducted on a synthetic dataset (880 samples) with simulated phonological features. Confirmation on real-world Bengali Sign Language video data remains a subject for future research. The scientific novelty of this research lies in the principled application of informativeness and diversity criteria – concepts drawn from active learning theory – to sign language lexicography, offering a scalable, reproducible solution for under-resourced sign languages.
1. Grimm N. Documentary Approaches to Lexicography. In: Current Issues in Descriptive Linguistics and Digital Humanities. Singapore: Springer; 2022. P. 551–567. https://doi.org/10.1007/978-981-19-2932-8_37
2. Ashrafi A., Mokhnachev V.S., Harlamenkov A.E. Improving Sign Language Recognition with Machine Learning and Artificial Intelligence. In: 2024 6th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE), 29 February – 02 March 2024, Moscow, Russia. IEEE; 2024. https://doi.org/10.1109/REEPE60449.2024.10479844
3. Ashrafi A., Mokhnachev V., Philippovich Y., et al. Russian Sign Language Recognition Using MediaPipe. In: Artificial Intelligence in Models, Methods and Applications: Artificial Intelligence in Engineering and Science (AIES-2022), 15–18 November 2022, Virtual Event. Cham: Springer; 2023. P. 299–313. https://doi.org/10.1007/978-3-031-22938-1_21
4. Ayyadevara V.K. Word2vec. In: Pro Machine Learning Algorithms. Berkeley: Apress; 2018. P. 167–178. https://doi.org/10.1007/978-1-4842-3564-5_8
5. Bochkarev V., Solovyev V., Shevlyakova A. A Corpus-Based Study of the Rate of Changes in Frequency of Syntactic Bigrams in English and Russian. In: Advances in Soft Computing: 18th Mexican International Conference on Artificial Intelligence (MICAI 2019), 27 October – 02 November 2019, Xalapa, Mexico. Cham: Springer; 2019. P. 463–474. https://doi.org/10.1007/978-3-030-33749-0_37
6. Moosavi M.S., Raimbaud P., Guillet Ch., et al. Enhancing weight perception in virtual reality: an analysis of kinematic features. Virtual Reality. 2024;28(2):72. https://doi.org/10.1007/s10055-024-00948-7
7. Fox N., Woll B., Cormier K. Best practices for sign language technology research. Universal Access in the Information Society. 2023;24(1):69–77. https://doi.org/10.1007/s10209-023-01039-1
8. Almeida A.M.P., Condeço T., Ramos F., et al. Signs Workshop: The Importance of Natural Gestures in the Promotion of Early Communication Skills of Children with Developmental Disabilities. In: Gesture-Based Human-Computer Interaction and Simulation: 7th International Gesture Workshop (GW 2007), 23–25 May 2007, Lisbon, Portugal. Berlin, Heidelberg: Springer; 2009. P. 245–254. https://doi.org/10.1007/978-3-540-92865-2_27
9. Napier J., Leeson L. Learning and Teaching Sign Languages. In: Sign Language in Action. London: Palgrave Macmillan; 2016. P. 87–118. https://doi.org/10.1057/9781137309778_4
10. Rojas H., Alvarez C., Rojas N. Statistical Hypothesis Testing for Information Value (IV). Journal of Statistical Theory and Applications. 2025;24(4):1196–1216. https://doi.org/10.1007/s44199-025-00144-9
11. Yazdani Sh., Hamidullah Y., España-Bonet C., et al. A Critical Study of Automatic Evaluation in Sign Language Translation. arXiv. URL: https://arxiv.org/abs/2510.25434 [Accessed 10th February 2026].
12. Ashrafi A., Mokhnachev V.S., Philippovich Y.N., et al. Development of Image Dataset Using Hand Gesture Recognition System for Progression of Sign Language Translator. In: Software Engineering Perspectives in Intelligent Systems: Proceedings of 4th Computational Methods in Systems and Software (CoMeSySo 2020), 14–17 October 2020, Virtual Event. Cham: Springer; 2020. P. 665–675. https://doi.org/10.1007/978-3-030-63322-6_56
13. Honkamaa J., Marttinen P. New Multimodal Similarity Measure for Image Registration via Modeling Local Functional Dependence with Linear Combination of Learned Basis Functions. In: Medical Image Computing and Computer Assisted Intervention: 28th International Conference (MICCAI 2025): Proceedings: Part II, 23–27 September 2025, Daejeon, South Korea. Cham: Springer; 2026. P. 399–408. https://doi.org/10.1007/978-3-032-04937-7_38
14. Aashik S., Ch S., Ghali V.S., et al. Logarithmic Frequency Modulated Thermal Wave Imaging for Subsurface Analysis. Russian Journal of Nondestructive Testing. 2024;60(8):898–911. https://doi.org/10.1134/S1061830924602149
Keywords: sign language lexicography, low-resource languages, bengali Sign Language (BdSL), information value, corpus linguistics, mediaPipe
For citation: Ashrafi A., Mokhnachev V.S. An information-theoretic metric for automated lexicographic selection in Bengali Sign Language. Modeling, Optimization and Information Technology. 2026;14(6). URL: https://moitvivt.ru/ru/journal/article?id=2298 DOI: 10.26102/2310-6018/2026.57.6.011 .
© Ashrafi A., Mokhnachev V.S. Статья опубликована на условиях лицензии Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NS 4.0)Received 02.04.2026
Revised 11.06.2026
Accepted 19.06.2026