A method for generating closed-type questions using LLMs

Dagaev A.E.

UDC 004.89
DOI: 10.26102/2310-6018/2025.48.1.021

Abstract
List of references
About authors

This study presents a method for closed-ended question generation leveraging large language models (LLM) to improve the quality and relevance of generated questions. The proposed framework combines the stages of generation, verification, and refinement, which allows for the improvement of low-quality questions through feedback rather than simply discarding them. The method was tested on three widely recognized datasets: SQuAD, Natural Questions, and RACE. Key evaluation metrics, including ROUGE, BLEU, and METEOR, consistently showed performance gains across all tested models. Four LLM configurations were used: O1, O1-mini, GPT-4o, and GPT-4o-mini, with O1 achieving the highest results across all datasets and metrics. Expert evaluation revealed an accuracy improvement of up to 14.4% compared to generation without verification and refinement. The results highlight the method's effectiveness in ensuring greater clarity, factual correctness, and contextual relevance in generated questions. The combination of automated verification and refinement further enhances outcomes, showcasing the potential of LLMs to refine text generation tasks. These findings will benefit researchers in natural language processing, educational technology, and professionals working on adaptive learning systems and corporate training software.

1. Huang J.-H., Zhu H., Shen Yi., et al. Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models. arXiv. URL: https://doi.org/10.48550/arXiv.2411.05706 [Accessed 3rd January 2025].

2. Chen Q., Wang Y., Wang F., et al. Decoding text from electroencephalography signals: A novel Hierarchical Gated Recurrent Unit with Masked Residual Attention Mechanism. Engineering Applications of Artificial Intelligence. 2025;139. https://doi.org/10.1016/j.engappai.2024.109615

3. Zakareya S., Alsaleem N., Alnaghmaish A., et al. Evaluating the Discrimination Index of AI-Generated vs. Human-Generated Multiple-Choice Questions: Action Research. In: ICERI2024 Proceedings: 17th annual International Conference of Education, Research and Innovation, 11–13 November 2024, Seville, Spain. IATED; 2024. pp. 221–226. https://doi.org/10.21125/iceri.2024.0137

4. Shetty N., Li Yo. Detailed Image Captioning and Hashtag Generation. Future Internet. 2024;16(12). https://doi.org/10.3390/fi16120444

5. Kwiatkowski T., Palomaki J., Redfield O., et al. Natural Questions: A Benchmark for Question Answering Research. Transactions of the Association for Computational Linguistics. 2019;7:453–466. https://doi.org/10.1162/tacl_a_00276

6. Lai G., Xie Q., Liu H., et al. RACE: Large-Scale ReAding Comprehension Dataset from Examinations. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 09–11 September 2017, Copenhagen, Denmark. Association for Computational Linguistics; 2017. pp. 785–794. https://doi.org/10.18653/v1/D17-1082

7. Thorne W., Robinson A., Peng B., et al. Increasing the Difficulty of Automatically Generated Questions via Reinforcement Learning with Synthetic Preference. In: Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities, 16 November 2024, Miami, USA. Association for Computational Linguistics; 2024. pp. 450–462. https://doi.org/10.18653/v1/2024.nlp4dh-1.43

8. Ribeiro M.T., Singh S., Guestrin C. Semantically Equivalent Adversarial Rules for Debugging NLP Models. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: Volume 1: Long Papers, 15–20 July 2018, Melbourne, Australia. Association for Computational Linguistics; 2018. pp. 856–865. https://doi.org/10.18653/v1/P18-1079

9. Brown T., Mann B., Ryder N., et al. Language Models Are Few-Shot Learners. In: Advances in Neural Information Processing Systems 33: 34th Conference on Neural Information Processing Systems 2020, NeurIPS 2020, 06–12 December 2020, Vancouver, Canada. 2020. pp. 1877–1901.

10. Bian Yu., Huang J., Cai X., et al. On Attention Redundancy: A Comprehensive Study. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, 06–11 June 2021, Online. Association for Computational Linguistics; 2021. pp. 930–945. https://doi.org/10.18653/v1/2021.naacl-main.72

11. Jiang N., De Marneffe M.-C. He Thinks He Knows Better than the Doctors: BERT for Event Factuality Fails on Pragmatics. Transactions of the Association for Computational Linguistics. 2021;9:1081–1097. http://doi.org/10.1162/tacl_a_00414

12. Lafkiar S., En Nahnahi N. An End-to-End Transformer-Based Model for Arabic Question Generation. Multimedia Tools and Applications. 2024. https://doi.org/10.1007/s11042-024-19958-3

13. Balepur N., Gu F., Ravichander A., et al. Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer? [Preprint]. arXiv. URL: https://doi.org/10.48550/arXiv.2410.15512 [Accessed 3rd January 2025].

14. Ye W., Zhang Q., Zhou X., et al. Correcting Factual Errors in LLMs via Inference Paths Based on Knowledge Graph. In: Proceedings of the 2024 International Conference on Computational Linguistics and Natural Language Processing (CLNLP), 19–21 July 2024, Yinchuan, China. IEEE; 2024. pp. 12–16. https://doi.org/10.1109/CLNLP64123.2024.00011

15. Wei X., Chen H., Yu H., et al. Guided Knowledge Generation with Language Models for Commonsense Reasoning. In: Findings of the Association for Computational Linguistics: EMNLP 2024, 12–16 November 2024, Miami, USA. Association for Computational Linguistics; 2024. pp. 1103–1136. http://doi.org/10.18653/v1/2024.findings-emnlp.61

Dagaev Alexander Evgenevich

Moscow Polytechnic University

Moscow, the Russian Federation

Keywords: question generation, large language models, artificial intelligence, natural language processing, o1, o1-mini, GPT-4o, GPT-4o-mini

For citation: Dagaev A.E. A method for generating closed-type questions using LLMs. Modeling, Optimization and Information Technology. 2025;13(1). URL: https://moitvivt.ru/ru/journal/pdf?id=1799 DOI: 10.26102/2310-6018/2025.48.1.021 (In Russ).

413

Full text in PDF

Received 13.01.2025

Revised 14.02.2025

Accepted 18.02.2025

Published 31.03.2025