Разработка информационной системы для анализа наполнения веб-сайтов текстовыми документами
Работая с сайтом, я даю свое согласие на использование файлов cookie. Это необходимо для нормального функционирования сайта, показа целевой рекламы и анализа трафика. Статистика использования сайта обрабатывается системой Яндекс.Метрика
Научный журнал Моделирование, оптимизация и информационные технологииThe scientific journal Modeling, Optimization and Information Technology
Online media
issn 2310-6018

Development of an information system for analyzing the content of websites with textual documents

idPocebneva I.V., idAndreeva K.A., idPugovkina N.A., idGusev P.Y.

UDC 681.3
DOI: 10.26102/2310-6018/2025.51.4.043

  • Abstract
  • List of references
  • About authors

This study focuses on analyzing the content of organizational websites with textual documents in order to support decision-making in the management of educational programs of a higher education organization. The presence of textual documents on an organization's website is one of the key criteria for assessing website effectiveness. These effectiveness criteria, in turn, are determined by the type of website and the type of organization that created and maintains it. The paper examines websites of higher education institutions and their specific characteristics. One such characteristic is the necessity of having curricula (working programs) available in the form of textual documents. Besides being a mandatory requirement, these curricula serve as informational materials for prospective students, thereby increasing the value of such information. Analyzing the availability and content of curricula can help address various management tasks; however, this requires designing and developing a tool to verify the presence of curricula. To solve the problem of verifying the availability and analyzing the content of curricula, an information system was designed and developed. The design phase involved creating an IDEF0 context diagram, a decomposed IDEF0 diagram, and an action (use case) diagram. The context diagram defined the system, inputs, outputs, controls, and mechanisms of the information system. The decomposed diagram includes the following modules: web parsing, document processing, curriculum analysis, data integration, and data export. The action diagram identifies the following actors: administrator, external website, database, visualization system, and includes the following use cases: website parsing, document processing, curriculum analysis, data integration, data export, and data visualization. The implementation of the information system enabled the creation of comprehensive dashboards for educational organizations, faculty-level dashboards, and department-level dashboards. The results of the system’s operation support managerial decision-making based on information about the availability of curricula on educational institution websites.

1. Brügger N. Website History and the Website as an Object of Study. New Media & Society. 2009;11(1-2):115–132. https://doi.org/10.1177/1461444808099574

2. Cebi S. Determining Importance Degrees of Website Design Parameters Based on Interactions and Types of Websites. Decision Support Systems. 2013;54(2):1030–1043. https://doi.org/10.1016/j.dss.2012.10.036

3. Veis L.D., Zhivoglyadov V.P. Informatsionnaya sistema podderzhki prinyatiya upravlencheskikh reshenii na osnove GIS i WEB-tekhnologii. Information Science and Control Systems. 2001;(2):50–57. (In Russ.).

4. Udartseva O.M., Rykhtorova A.E. Using Web Analytics Tools to Assess the Effectiveness of Means for Promoting Library Resources. Bibliosphere. 2018;(2):93–99. (In Russ.). https://doi.org/10.20913/1815-3186-2018-2-93-99

5. Butkiewicz M., Madhyastha H.V., Sekar V. Understanding Website Complexity: Measurements, Metrics, and Implications. In: IMC '11: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, 02–04 November 2011, Berlin, Germany. New York: Association for Computing Machinery; 2011. P. 313–328. https://doi.org/10.1145/2068816.2068846

6. Khder M. Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application. International Journal of Advances in Soft Computing & Its Applications. 2021;13(3):145–168.

7. Tang L., Laban Ph., Durrett G. MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents. In: EMNLP 2024: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 12–16 November 2024, Miami, FL, USA. Association for Computational Linguistics; 2024. P. 8818–8847.

8. Apu K.U. AI-Driven Data Analytics and Automation: A Systematic Literature Review of Industry Applications. Strategic Data Management and Innovation. 2025;2(01):21–40.

9. Li X., Pu W., Zhao X. Agent Action Diagram: Toward a Model for Emergency Management System. Simulation Modelling Practice and Theory. 2019;94:66–99. https://doi.org/10.1016/j.simpat.2019.02.004

10. Van Rossum G., Drake F.L. An Introduction to Python. Bristol: Network Theory Ltd.; 2003. 164 p.

11. Mateos-Garcia J., Steinmueller W.E. The Institutions of Open Source Software: Examining the Debian Community. Information Economics and Policy. 2008;20(4):333–344. https://doi.org/10.1016/j.infoecopol.2008.06.001

Pocebneva Irina Valerievna
Candidate of Engineering Sciences, Docent

ORCID |

Voronezh State Technical University

Voronezh, Russian Federation

Andreeva Kristina Alekseevna

ORCID |

Voronezh State Technical University

Voronezh, Russian Federation

Pugovkina Natalia Aleksandrovna

ORCID |

Voronezh State Technical University

Voronezh, Russian Federation

Gusev Pavel Yrievich
Doctor of Engineering Sciences, Docent

ORCID |

Voronezh State Technical University

Voronezh, Russian Federation

Keywords: information system, website analysis, work programs of disciplines, higher education institutions, website parsing, document processing, IDEF0 diagrams, data visualization

For citation: Pocebneva I.V., Andreeva K.A., Pugovkina N.A., Gusev P.Y. Development of an information system for analyzing the content of websites with textual documents. Modeling, Optimization and Information Technology. 2025;13(4). URL: https://moitvivt.ru/ru/journal/pdf?id=2063 DOI: 10.26102/2310-6018/2025.51.4.043 (In Russ).

28

Full text in PDF

Received 02.09.2025

Revised 05.11.2025

Accepted 12.11.2025