References

moitvivt

Моделирование, оптимизация и информационные технологии

Modeling, Optimization and Information Technology

2310-6018

Издательство

10.26102/2310-6018/2026.57.6.009

2368

Обнаружение фейковых новостей в малоресурсных языках с использованием больших языковых моделей

Fake news detection in low-resource languages with LLMs

Кабир

А. С. M. Хумаюн

Kabir

A. S. M. Humaun

humaun.kabir@phystech.edu aff-1

Кхан

Самеед Ахмед

Khan

Sameed Ahmed

sameedkhandurrani@gmail.com aff-2

Харламов

Александр Александрович

Kharlamov

Alexander Alexandrovich

kharlamov@analyst.ru aff-3

Воронков

Илья Михайлович

Voronkov

Ilia Mikhailovich

voronkov.im@phystech.edu aff-4

Московский физико-технический институт Moscow Institute of Physics and Technology

Университет Иннополис Innopolis University

Московский физико-технический институт Moscow Institute of Physics and Technology

01 01 2026

1 1

10.26102/2310-6018/2026.57.6.009

2026

This work is licensed under a Creative Commons Attribution 4.0 International License

Распространение фейковых новостей представляет собой глобальную проблему в цифровую эпоху доступности информации. Языки с богатыми ресурсами активно решают эту проблему благодаря значительным исследовательским усилиям, тогда как языки с ограниченными ресурсами остаются недостаточно охваченными в этом направлении. Бенгальский язык является одним из таких языков с ограниченными вычислительными ресурсами несмотря на то, что он входит в десятку самых распространённых языков мира. С целью внесения вклада в данную область и решения проблемы фейковых новостей, данное исследование сосредоточено на их обнаружении в бенгальском языке с использованием современных достижений в области языковых моделей, включая методы кросс-лингвистического промтинга для повышения качества ответов больших языковых моделей. В работе используются модели с открытым исходным кодом для обеспечения доступности ресурсов, а именно большие языковые модели DeepSeek-R1, Llama 3.2 и Qwen 2.5. Проводится подробный анализ способности каждой модели обнаруживать фейковые новости на бенгальском языке. Результаты показывают, что модель Qwen 2.5 превосходит другие модели в данной задаче, достигая максимальной точности 97,5 %, при этом не демонстрируя неопределённых ответов.

The proliferation of fake news is a global challenge to tackle in the digital era of information availability. The resourceful languages are tackling this issue through enormous research works whereas the low-resource languages are left behind to address the issue adequately. Bangla is one of the low-resource languages in computation despite being in the top ten most spoken languages in the world. To contribute in the field and address the issue of fake news, this research work focuses on the fake news detection in Bangla language leveraging large recent advancement of language models using cross-lingual prompting techniques for better response from the large language models. We leverage the open source models for resource accessibility and utilize DeepSeek-R1, Llama 3.2 and Qwen 2.5 large language models in our experiments and extensively analyze the fake news detection capacity of each model in Bangla language. We find that Qwen 2.5 outperforms the other models in this specific task achieving a maximum accuracy of 97.5 while it also reports no inconclusive response.

фейковые новости бенгальский язык большие языковые модели языки с ограниченными ресурсами кросс-языковой промтинг

fake news bangla large language models low-resource language cross-lingual prompting

Исследование выполнено без спонсорской поддержки.

The study was performed without external funding.

References 1

Darvin R. Language and identity in the digital age. In: The Routledge Handbook of Language and Identity. Routledge; 2016. P. 523–540.

Lee N., Li B.Z., Wang S., et al. Language Models as Fact Checkers? In: Proceedings of the Third Workshop on Fact Extraction and VERification (FEVER), 09 July 2020, Seattle, WA, USA. Association for Computational Linguistics; 2020. P. 36–41. https://doi.org/10.18653/v1/2020.fever-1.5

Hoes E., Altay S., Bermeo J. Leveraging ChatGPT for Efficient Fact-Checking. OSF. URL: https://doi.org/10.31234/osf.io/qnjkf [Accessed 19th April 2026].

DeepSeek-AI, Guo D., Yang D., et al. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv. URL: https://arxiv.org/abs/2501.12948 [Accessed 19th April 2026].

Grattafiori A., Dubey A., Jauhri A., et al. The Llama 3 Herd of Models. arXiv. URL: https://arxiv.org/abs/2407.21783 [Accessed 19th April 2026].

Hossain M.Z., Rahman M.A., Islam M.S., et al. BanFakeNews: A Dataset for Detecting Fake News in Bangla. In: Proceedings of the Twelfth Language Resources and Evaluation Conference (LREC 2020), 11–16 May 2020, Marseille, France. European Language Resources Association; 2020. P. 2862–2871. URL: https://aclanthology.org/2020.lrec-1.349

Kabir A.S.M.H., Kharlamov A.A., Voronkov I.M. Research Methods for Fake News Detection in Bangla Text. In: Advances in Neural Computation, Machine Learning, and Cognitive Research VII: Selected Papers from the XXV International Conference on Neuroinformatics, 23–27 October 2023, Moscow, Russia. Cham: Springer; 2023. P. 54–60. https://doi.org/10.1007/978-3-031-44865-2_6

Shibu H.M., Datta Sh., Miah M.S., et al. From Scarcity to Capability: Empowering Fake News Detection in Low-Resource Languages with LLMs. In: Proceedings of the 31st International Conference on Computational Linguistics, COLING 2025 – Workshops, 19–24 January 2025, Abu Dhabi, UAE. Association for Computational Linguistics; 2025. P. 100–107. URL: https://aclanthology.org/2025.indonlp-1.12

Rubin V., Conroy N., Chen Y., et al. Fake News or Truth? Using Satirical Cues to Detect Potentially Misleading News. In: Proceedings of the Second Workshop on Computational Approaches to Deception Detection, 17 June 2016, San Diego, CA, USA. Association for Computational Linguistics; 2016. P. 7–17. https://doi.org/10.18653/v1/W16-0802

Hossain E., Kaysar M.N., Jalal Uddin Joy A.Z.M., et al. A Study Towards Bangla Fake News Detection Using Machine Learning and Deep Learning. In: Sentimental Analysis and Deep Learning: Proceedings of ICSADL 2021, 18–19 June 2021, Songkhla, Thailand. Singapore: Springer; 2021. P. 79–95. https://doi.org/10.1007/978-981-16-5157-1_7

Shu K., Sliva A., Wang S., et al. Fake News Detection on Social Media: A Data Mining Perspective. ACM SIGKDD Explorations Newsletter. 2017;19(1):22–36. https://doi.org/10.1145/3137597.3137600

Vaswani A., Shazeer N., Parmar N., et al. Attention is All You Need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, 04–09 December 2017, Long Beach, CA, USA. 2017. P. 5998–6008.

Devlin J., Chang M.-W., Lee K., et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019): Volume 1, 02–07 June 2019, Minneapolis, MN, USA. Association for Computational Linguistics; 2019. P. 4171–4186. https://doi.org/10.18653/v1/N19-1423

Qin L., Chen Q., Wei F., et al. Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning across Languages. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), 06–10 December 2023, Singapore. Association for Computational Linguistics; 2023. P. 2695–2709. https://doi.org/10.18653/v1/2023.emnlp-main.163

The authors declare that there are no conflicts of interest present.