References

moitvivt

Моделирование, оптимизация и информационные технологии

Modeling, Optimization and Information Technology

2310-6018

Издательство

10.26102/2310-6018/2026.57.6.011

2298

Информационно-теоретическая метрика для автоматического лексикографического отбора в бенгальском жестовом языке

An information-theoretic metric for automated lexicographic selection in Bengali Sign Language

0000-0002-5753-6135

Ашрафи

Арифа

Ashrafi

Arifa

arifaa13@gmail.com aff-1

0000-0001-6520-0386

Мохначев

Виктор Сергеевич

Mokhnachev

Viktor Sergeevich

gagashaggy@inbox.ru aff-2

Московский политехнический университет Mocow Polytechnic University

01 01 2026

1 1

10.26102/2310-6018/2026.57.6.011

2026

This work is licensed under a Creative Commons Attribution 4.0 International License

Создание решений для помощи людям с нарушениями слуха, использующим бенгальский жестовый язык, который считается языком с ограниченными ресурсами, представляет собой сложную задачу из-за нехватки ресурсов и доступности экспертов. В данной статье представлена новая информационно-теоретическая метрика – информационная ценность для лексикографии жестов (IV-SL), разработанная для автоматизации процесса лексикографического отбора при разработке словаря жестового языка. Предложенная структура использует реализацию на основе Python, которая включает MediaPipe Holistic для извлечения визуально-кинематических признаков, включая формы рук, траекторию движения и выражения лица, а также Word2Vec для семантических связей между векторными представлениями слов бенгальского языка. Итеративный механизм отбора определяет приоритетность жестов на основе максимального прироста информации на словарную запись, балансируя редкость и разнообразие для минимизации избыточности при обеспечении широкого лексического охвата. Экспериментальная проверка показывает, что метрика IV-SL создает приоритетные лексиконы с высокой степенью соответствия экспертным оценкам лингвистов, значительно превосходящие базовые модели, основанные на частоте. Первичная валидация проведена на синтетическом датасете (880 образцов) с моделированием фонологических признаков. Подтверждение на реальных видеоданных бенгальского жестового языка остается предметом будущих исследований. Научная новизна данного исследования заключается в принципиальном применении критериев информативности и разнообразия – концепций, заимствованных из теории активного обучения, – к лексикографии жестовых языков, предлагая масштабируемое и воспроизводимое решение для жестовых языков с ограниченными ресурсами.

Creating solutions to help the hearing impaired individuals who use Bengali Sign Language, which is considered a low-resource language, is a challenge due to a lack of resources and expert availability. This paper introduces a novel information-theoretic metric, the Information Value for Sign Lexicography (IV-SL), designed to automate the lexicographic selection process for sign language dictionary development. The proposed framework uses a Python-based implementation, which incorporates MediaPipe Holistic for the extraction of visual-kinematic features, including handshapes, movement trajectory, and facial expressions, as well as Word2Vec for the semantic relationships between the gloss word embeddings of the Bengali language. An iterative selection mechanism prioritizes signs based on maximum information gain per dictionary entry, balancing rarity and diversity to minimize redundancy while ensuring broad lexical coverage. Experimental validation demonstrates that the IV-SL metric produces prioritized lexicons with strong alignment to expert linguist judgments, significantly outperforming frequency-based baselines. Initial validation was conducted on a synthetic dataset (880 samples) with simulated phonological features. Confirmation on real-world Bengali Sign Language video data remains a subject for future research. The scientific novelty of this research lies in the principled application of informativeness and diversity criteria – concepts drawn from active learning theory – to sign language lexicography, offering a scalable, reproducible solution for under-resourced sign languages.

лексикография жестовых языков низкоресурсные языки бенгальский жестовый язык (BdSL) информационная ценность корпусная лингвистика MediaPipe

sign language lexicography low-resource languages Bengali Sign Language (BdSL) information value corpus linguistics MediaPipe

Исследование выполнено без спонсорской поддержки.

The study was performed without external funding.

References 1

Grimm N. Documentary Approaches to Lexicography. In: Current Issues in Descriptive Linguistics and Digital Humanities. Singapore: Springer; 2022. P. 551–567. https://doi.org/10.1007/978-981-19-2932-8_37

Ashrafi A., Mokhnachev V.S., Harlamenkov A.E. Improving Sign Language Recognition with Machine Learning and Artificial Intelligence. In: 2024 6th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE), 29 February – 02 March 2024, Moscow, Russia. IEEE; 2024. https://doi.org/10.1109/REEPE60449.2024.10479844

Ashrafi A., Mokhnachev V., Philippovich Y., et al. Russian Sign Language Recognition Using MediaPipe. In: Artificial Intelligence in Models, Methods and Applications: Artificial Intelligence in Engineering and Science (AIES-2022), 15–18 November 2022, Virtual Event. Cham: Springer; 2023. P. 299–313. https://doi.org/10.1007/978-3-031-22938-1_21

Ayyadevara V.K. Word2vec. In: Pro Machine Learning Algorithms. Berkeley: Apress; 2018. P. 167–178. https://doi.org/10.1007/978-1-4842-3564-5_8

Bochkarev V., Solovyev V., Shevlyakova A. A Corpus-Based Study of the Rate of Changes in Frequency of Syntactic Bigrams in English and Russian. In: Advances in Soft Computing: 18th Mexican International Conference on Artificial Intelligence (MICAI 2019), 27 October – 02 November 2019, Xalapa, Mexico. Cham: Springer; 2019. P. 463–474. https://doi.org/10.1007/978-3-030-33749-0_37

Moosavi M.S., Raimbaud P., Guillet Ch., et al. Enhancing weight perception in virtual reality: an analysis of kinematic features. Virtual Reality. 2024;28(2):72. https://doi.org/10.1007/s10055-024-00948-7

Fox N., Woll B., Cormier K. Best practices for sign language technology research. Universal Access in the Information Society. 2023;24(1):69–77. https://doi.org/10.1007/s10209-023-01039-1

Almeida A.M.P., Condeço T., Ramos F., et al. Signs Workshop: The Importance of Natural Gestures in the Promotion of Early Communication Skills of Children with Developmental Disabilities. In: Gesture-Based Human-Computer Interaction and Simulation: 7th International Gesture Workshop (GW 2007), 23–25 May 2007, Lisbon, Portugal. Berlin, Heidelberg: Springer; 2009. P. 245–254. https://doi.org/10.1007/978-3-540-92865-2_27

Napier J., Leeson L. Learning and Teaching Sign Languages. In: Sign Language in Action. London: Palgrave Macmillan; 2016. P. 87–118. https://doi.org/10.1057/9781137309778_4

Rojas H., Alvarez C., Rojas N. Statistical Hypothesis Testing for Information Value (IV). Journal of Statistical Theory and Applications. 2025;24(4):1196–1216. https://doi.org/10.1007/s44199-025-00144-9

Yazdani Sh., Hamidullah Y., España-Bonet C., et al. A Critical Study of Automatic Evaluation in Sign Language Translation. arXiv. URL: https://arxiv.org/abs/2510.25434 [Accessed 10th February 2026].

Ashrafi A., Mokhnachev V.S., Philippovich Y.N., et al. Development of Image Dataset Using Hand Gesture Recognition System for Progression of Sign Language Translator. In: Software Engineering Perspectives in Intelligent Systems: Proceedings of 4th Computational Methods in Systems and Software (CoMeSySo 2020), 14–17 October 2020, Virtual Event. Cham: Springer; 2020. P. 665–675. https://doi.org/10.1007/978-3-030-63322-6_56

Honkamaa J., Marttinen P. New Multimodal Similarity Measure for Image Registration via Modeling Local Functional Dependence with Linear Combination of Learned Basis Functions. In: Medical Image Computing and Computer Assisted Intervention: 28th International Conference (MICCAI 2025): Proceedings: Part II, 23–27 September 2025, Daejeon, South Korea. Cham: Springer; 2026. P. 399–408. https://doi.org/10.1007/978-3-032-04937-7_38

Aashik S., Ch S., Ghali V.S., et al. Logarithmic Frequency Modulated Thermal Wave Imaging for Subsurface Analysis. Russian Journal of Nondestructive Testing. 2024;60(8):898–911. https://doi.org/10.1134/S1061830924602149

The authors declare that there are no conflicts of interest present.