References

moitvivt

Моделирование, оптимизация и информационные технологии

Modeling, Optimization and Information Technology

2310-6018

Издательство

10.26102/2310-6018/2025.48.1.021

1799

Метод генерации вопросов закрытого типа с использованием LLM

A method for generating closed-type questions using LLMs

Дагаев

Александр Евгеньевич

Dagaev

Alexander Evgenevich

a.e.dagaev@mospolytech.ru aff-1

Московский политехнический университет Moscow Polytechnic University

01 01 2026

1 1

10.26102/2310-6018/2025.48.1.021

2026

This work is licensed under a Creative Commons Attribution 4.0 International License

В исследовании представлен метод генерации вопросов закрытого типа, использующий большие языковые модели (LLM) для повышения качества и релевантности создаваемых вопросов. Предложенная структура объединяет этапы генерации, верификации и корректировки, что позволяет не исключать некачественные вопросы, а улучшать их с использованием обратной связи. Метод был протестирован на трех популярных наборах данных: SQuAD, Natural Questions и RACE. Ключевые метрики оценки ROUGE, BLEU и METEOR стабильно показывали улучшения производительности на всех протестированных моделях. В исследовании использовались четыре варианта LLM: O1, O1-mini, GPT-4o и GPT-4o-mini, при этом O1 достигла наивысших результатов по всем наборам данных и метрикам. Экспертная оценка показала увеличение точности до 14,4 % по сравнению с генерацией без верификации и корректировки. Полученные результаты подчеркивают эффективность метода в обеспечении большей ясности, фактической корректности и контекстуальной релевантности в сгенерированных вопросах. Сочетание автоматизированной верификации и корректировки дополнительно улучшает результаты, демонстрируя потенциал LLM в совершенствовании задач генерации текста. Результаты работы будут полезны исследователям в области обработки естественного языка, образовательных технологий, а также специалистам, работающим над адаптивными системами обучения и программным обеспечением корпоративного обучения.

This study presents a method for closed-ended question generation leveraging large language models (LLM) to improve the quality and relevance of generated questions. The proposed framework combines the stages of generation, verification, and refinement, which allows for the improvement of low-quality questions through feedback rather than simply discarding them. The method was tested on three widely recognized datasets: SQuAD, Natural Questions, and RACE. Key evaluation metrics, including ROUGE, BLEU, and METEOR, consistently showed performance gains across all tested models. Four LLM configurations were used: O1, O1-mini, GPT-4o, and GPT-4o-mini, with O1 achieving the highest results across all datasets and metrics. Expert evaluation revealed an accuracy improvement of up to 14.4% compared to generation without verification and refinement. The results highlight the method's effectiveness in ensuring greater clarity, factual correctness, and contextual relevance in generated questions. The combination of automated verification and refinement further enhances outcomes, showcasing the potential of LLMs to refine text generation tasks. These findings will benefit researchers in natural language processing, educational technology, and professionals working on adaptive learning systems and corporate training software.

генерация вопросов большие языковые модели искусственный интеллект обработка естественного языка O1 O1-mini GPT-4o GPT-4o-mini

question generation large language models artificial intelligence natural language processing O1 O1-mini GPT-4o GPT-4o-mini

Исследование выполнено без спонсорской поддержки.

The study was performed without external funding.

References 1

Huang J.-H., Zhu H., Shen Yi., et al. Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models. arXiv. URL: https://doi.org/10.48550/arXiv.2411.05706 [Accessed 3rd January 2025].

Chen Q., Wang Y., Wang F., et al. Decoding text from electroencephalography signals: A novel Hierarchical Gated Recurrent Unit with Masked Residual Attention Mechanism. Engineering Applications of Artificial Intelligence. 2025;139. https://doi.org/10.1016/j.engappai.2024.109615

Zakareya S., Alsaleem N., Alnaghmaish A., et al. Evaluating the Discrimination Index of AI-Generated vs. Human-Generated Multiple-Choice Questions: Action Research. In: ICERI2024 Proceedings: 17th annual International Conference of Education, Research and Innovation, 11–13 November 2024, Seville, Spain. IATED; 2024. pp. 221–226. https://doi.org/10.21125/iceri.2024.0137

Shetty N., Li Yo. Detailed Image Captioning and Hashtag Generation. Future Internet. 2024;16(12). https://doi.org/10.3390/fi16120444

Kwiatkowski T., Palomaki J., Redfield O., et al. Natural Questions: A Benchmark for Question Answering Research. Transactions of the Association for Computational Linguistics. 2019;7:453–466. https://doi.org/10.1162/tacl_a_00276

Lai G., Xie Q., Liu H., et al. RACE: Large-Scale ReAding Comprehension Dataset from Examinations. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 09–11 September 2017, Copenhagen, Denmark. Association for Computational Linguistics; 2017. pp. 785–794. https://doi.org/10.18653/v1/D17-1082

Thorne W., Robinson A., Peng B., et al. Increasing the Difficulty of Automatically Generated Questions via Reinforcement Learning with Synthetic Preference. In: Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities, 16 November 2024, Miami, USA. Association for Computational Linguistics; 2024. pp. 450–462. https://doi.org/10.18653/v1/2024.nlp4dh-1.43

Ribeiro M.T., Singh S., Guestrin C. Semantically Equivalent Adversarial Rules for Debugging NLP Models. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: Volume 1: Long Papers, 15–20 July 2018, Melbourne, Australia. Association for Computational Linguistics; 2018. pp. 856–865. https://doi.org/10.18653/v1/P18-1079

Brown T., Mann B., Ryder N., et al. Language Models Are Few-Shot Learners. In: Advances in Neural Information Processing Systems 33: 34th Conference on Neural Information Processing Systems 2020, NeurIPS 2020, 06–12 December 2020, Vancouver, Canada. 2020. pp. 1877–1901.

Bian Yu., Huang J., Cai X., et al. On Attention Redundancy: A Comprehensive Study. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, 06–11 June 2021, Online. Association for Computational Linguistics; 2021. pp. 930–945. https://doi.org/10.18653/v1/2021.naacl-main.72

Jiang N., De Marneffe M.-C. He Thinks He Knows Better than the Doctors: BERT for Event Factuality Fails on Pragmatics. Transactions of the Association for Computational Linguistics. 2021;9:1081–1097. http://doi.org/10.1162/tacl_a_00414

Lafkiar S., En Nahnahi N. An End-to-End Transformer-Based Model for Arabic Question Generation. Multimedia Tools and Applications. 2024. https://doi.org/10.1007/s11042-024-19958-3

Balepur N., Gu F., Ravichander A., et al. Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer? [Preprint]. arXiv. URL: https://doi.org/10.48550/arXiv.2410.15512 [Accessed 3rd January 2025].

Ye W., Zhang Q., Zhou X., et al. Correcting Factual Errors in LLMs via Inference Paths Based on Knowledge Graph. In: Proceedings of the 2024 International Conference on Computational Linguistics and Natural Language Processing (CLNLP), 19–21 July 2024, Yinchuan, China. IEEE; 2024. pp. 12–16. https://doi.org/10.1109/CLNLP64123.2024.00011

Wei X., Chen H., Yu H., et al. Guided Knowledge Generation with Language Models for Commonsense Reasoning. In: Findings of the Association for Computational Linguistics: EMNLP 2024, 12–16 November 2024, Miami, USA. Association for Computational Linguistics; 2024. pp. 1103–1136. http://doi.org/10.18653/v1/2024.findings-emnlp.61

The authors declare that there are no conflicts of interest present.