Hybrid semantic reduction of texts in library information systems

idRzyankin I.S., idNoskov M.V.

UDC 004.852; 021.6
DOI: 10.26102/2310-6018/2026.54.3.013

Abstract
List of references
About authors

The relevance of the study is determined by the continuous growth of textual information in library information systems and the need to ensure fast and meaningful navigation across electronic collections under constrained computational resources. Existing automatic summarization solutions are primarily oriented toward large-scale language models, which limits their practical deployment within local library infrastructures. In this context, the paper aims to develop a resource-efficient method of semantic text reduction that balances the quality of semantic representation with computational feasibility. The proposed approach is based on a hybrid architecture that sequentially combines lexical reduction using word clouds with neural summarization performed by compact models. In addition, a context-oriented evaluation metric is introduced to assess relevance with regard to semantic coherence, structural characteristics, and domain-specific terms significant for the library environment. An experimental study conducted on a corpus of 1178 documents demonstrates that the hybrid approach improves relevance indicators while simultaneously reducing inference time compared to direct neural summarization of the full text. The obtained results confirm the practical applicability of the proposed method for library information systems operating under limited computational infrastructure and its usefulness for navigation and cataloging tasks.

1. Lyon L. The Informatics Transform: Re-Engineering Libraries for the Data Decade. International Journal of Digital Curation. 2012;7(1):126–138. https://doi.org/10.2218/ijdc.v7i1.220

2. Roy P. Big data analytics in university libraries on today's librarianship decision-making: A disruptive innovation perspective. IFLA Journal. 2025;51(8). https://doi.org/10.1177/03400352251318753

3. Mridha M.F., Lima A.A., Nur K., et al. A Survey of Automatic Text Summarization: Progress, Process and Challenges. IEEE Access. 2021;9:156043–156070. https://doi.org/10.1109/ACCESS.2021.3129786

4. Arnaboldi V., Cho J., Sternberg P.W. Wormicloud: A new text summarization tool based on word clouds to explore the C. elegans literature. Database. 2021;2021. https://doi.org/10.1093/database/baab015

5. Strubell E., Ganesh A., McCallum A. Energy and Policy Considerations for Deep Learning in NLP. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019: Volume 1: Long Papers, 28 July – 02 August 2019, Florence, Italy. Association for Computational Linguistics; 2019. P. 3645–3650. https://doi.org/10.18653/v1/P19-1355

6. Treviso M., Lee J.-U., Ji T., et al. Efficient Methods for Natural Language Processing: A Survey. Transactions of the Association for Computational Linguistics. 2023;11:826–860. https://doi.org/10.1162/tacl_a_00577

7. Syed A.A., Gaol F.L., Matsuo T. A Survey of the State-of-the-Art Models in Neural Abstractive Text Summarization. IEEE Access. 2021;9:13248–13265. https://doi.org/10.1109/ACCESS.2021.3052783

8. Goodwin T., Savery M., Demner-Fushman D. Flight of the PEGASUS? Comparing Transformers on Few-Shot and Zero-Shot Multi-document Abstractive Summarization. In: Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, 08–13 December 2020, Barcelona, Spain (Online). International Committee on Computational Linguistics; 2020. P. 5640–5646. https://doi.org/10.18653/v1/2020.coling-main.494

9. Li J. A comparative study of keyword extraction algorithms for English texts. Journal of Intelligent Systems. 2021;30:808–815. https://doi.org/10.1515/jisys-2021-0040

10. Skeppstedt M., Ahltorp M., Kucher K., Lindström M. From word clouds to Word Rain: Revisiting the classic word cloud to visualize climate change texts. Information Visualization. 2024;23(2). https://doi.org/10.1177/14738716241236188

11. Bhandari M., Gour P.N., Ashfaq A., Liu P., Neubig G. Re-evaluating Evaluation in Text Summarization. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, 16–20 November 2020, Online. Association for Computational Linguistics; 2020. P. 9347–9359. https://doi.org/10.18653/v1/2020.emnlp-main.751

12. Hobson S.P., Dorr B.J., Monz Ch., Schwartz R. Task-based evaluation of text summarization using relevance prediction. Information Processing & Management. 2007;43(6):1482–1499. https://doi.org/10.1016/j.ipm.2007.01.002

13. Ushio A., Liberatore F., Camacho-Collados J. Back to the Basics: A Quantitative Analysis of Statistical and Graph-Based Term Weighting Schemes for Keyword Extraction. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, 07–11 November 2021, Virtual Event / Punta Cana, Dominican Republic. Association for Computational Linguistics; 2021. P. 8089–8103. https://doi.org/10.18653/v1/2021.emnlp-main.638

14. Hearst M.A., Pedersen E., Patil L., et al. An Evaluation of Semantically Grouped Word Cloud Designs. IEEE Transactions on Visualization and Computer Graphics. 2020;26(9):2748–2761. https://doi.org/10.1109/TVCG.2019.2904683

15. Dice D., Kogan A. Optimizing Inference Performance of Transformers on CPUs. arXiv. URL: https://arxiv.org/abs/2102.06621 [Accessed 19th January 2026].

16. Xu Y., Xu R., Iter D., et al. InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT. In: Findings of the Association for Computational Linguistics: EMNLP 2023, 06–10 December 2023, Singapore. Association for Computational Linguistics; 2023. P. 13879–13892. https://doi.org/10.18653/v1/2023.findings-emnlp.927

17. Lee J.-U., Puerto H., van Aken B. Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research. arXiv. URL: https://arxiv.org/abs/2306.16900 [Accessed 19th January 2026].

18. Desai Sh., Xu J., Durrett G. Compressive Summarization with Plausibility and Salience Modeling. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, 16–20 November 2020, Online. Association for Computational Linguistics; 2020. P. 6259–6274. https://doi.org/10.18653/v1/2020.emnlp-main.507

19. Mei A., Kabir A., Bapat R., et al. Learning to Prioritize: Precision-Driven Sentence Filtering for Long Text Summarization. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, LREC 2022, 20–25 June 2022, Marseille, France. European Language Resources Association; 2022. P. 313–318.

20. Liang X., Li J., Wu Sh., et al. An Efficient Coarse-to-Fine Facet-Aware Unsupervised Summarization Framework Based on Semantic Blocks. In: Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, 12–17 October 2022, Gyeongju, Republic of Korea. International Committee on Computational Linguistics; 2022. P. 6415–6425.

Rzyankin Ilya Sergeevich

Email: i-rzyankin@yandex.ru

ORCID | eLibrary |

Siberian Federal University

Krasnoyarsk, Russian Federation

Noskov Mikhail Valerianovich
Candidate of Physical and Mathematical Sciences, Professor
Email: mnoskov@sfu-kras.ru

ORCID | eLibrary |

Siberian Federal University

Krasnoyarsk, Russian Federation

Keywords: semantic text reduction, automatic summarization, word cloud, library information systems, hybrid text processing methods, neural models, relevance evaluation, library Relevance Score

For citation: Rzyankin I.S., Noskov M.V. Hybrid semantic reduction of texts in library information systems. Modeling, Optimization and Information Technology. 2026;14(3). URL: https://moitvivt.ru/ru/journal/article?id=2220 DOI: 10.26102/2310-6018/2026.54.3.013 (In Russ).

131

Full text in PDF

Скачать JATS XML

Received 11.02.2026

Revised 20.03.2026

Accepted 25.03.2026

Published 31.03.2026