Keywords: information system, website analysis, work programs of disciplines, higher education institutions, website parsing, document processing, IDEF0 diagrams, data visualization
Development of an information system for analyzing the content of websites with textual documents
UDC 681.3
DOI: 10.26102/2310-6018/2025.51.4.043
This study focuses on analyzing the content of organizational websites with textual documents in order to support decision-making in the management of educational programs of a higher education organization. The presence of textual documents on an organization's website is one of the key criteria for assessing website effectiveness. These effectiveness criteria, in turn, are determined by the type of website and the type of organization that created and maintains it. The paper examines websites of higher education institutions and their specific characteristics. One such characteristic is the necessity of having curricula (working programs) available in the form of textual documents. Besides being a mandatory requirement, these curricula serve as informational materials for prospective students, thereby increasing the value of such information. Analyzing the availability and content of curricula can help address various management tasks; however, this requires designing and developing a tool to verify the presence of curricula. To solve the problem of verifying the availability and analyzing the content of curricula, an information system was designed and developed. The design phase involved creating an IDEF0 context diagram, a decomposed IDEF0 diagram, and an action (use case) diagram. The context diagram defined the system, inputs, outputs, controls, and mechanisms of the information system. The decomposed diagram includes the following modules: web parsing, document processing, curriculum analysis, data integration, and data export. The action diagram identifies the following actors: administrator, external website, database, visualization system, and includes the following use cases: website parsing, document processing, curriculum analysis, data integration, data export, and data visualization. The implementation of the information system enabled the creation of comprehensive dashboards for educational organizations, faculty-level dashboards, and department-level dashboards. The results of the system’s operation support managerial decision-making based on information about the availability of curricula on educational institution websites.
1. Brügger N. Website History and the Website as an Object of Study. New Media & Society. 2009;11(1-2):115–132. https://doi.org/10.1177/1461444808099574
2. Cebi S. Determining Importance Degrees of Website Design Parameters Based on Interactions and Types of Websites. Decision Support Systems. 2013;54(2):1030–1043. https://doi.org/10.1016/j.dss.2012.10.036
3. Veis L.D., Zhivoglyadov V.P. Informatsionnaya sistema podderzhki prinyatiya upravlencheskikh reshenii na osnove GIS i WEB-tekhnologii. Information Science and Control Systems. 2001;(2):50–57. (In Russ.).
4. Udartseva O.M., Rykhtorova A.E. Using Web Analytics Tools to Assess the Effectiveness of Means for Promoting Library Resources. Bibliosphere. 2018;(2):93–99. (In Russ.). https://doi.org/10.20913/1815-3186-2018-2-93-99
5. Butkiewicz M., Madhyastha H.V., Sekar V. Understanding Website Complexity: Measurements, Metrics, and Implications. In: IMC '11: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, 02–04 November 2011, Berlin, Germany. New York: Association for Computing Machinery; 2011. P. 313–328. https://doi.org/10.1145/2068816.2068846
6. Khder M. Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application. International Journal of Advances in Soft Computing & Its Applications. 2021;13(3):145–168.
7. Tang L., Laban Ph., Durrett G. MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents. In: EMNLP 2024: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 12–16 November 2024, Miami, FL, USA. Association for Computational Linguistics; 2024. P. 8818–8847.
8. Apu K.U. AI-Driven Data Analytics and Automation: A Systematic Literature Review of Industry Applications. Strategic Data Management and Innovation. 2025;2(01):21–40.
9. Li X., Pu W., Zhao X. Agent Action Diagram: Toward a Model for Emergency Management System. Simulation Modelling Practice and Theory. 2019;94:66–99. https://doi.org/10.1016/j.simpat.2019.02.004
10. Van Rossum G., Drake F.L. An Introduction to Python. Bristol: Network Theory Ltd.; 2003. 164 p.
11. Mateos-Garcia J., Steinmueller W.E. The Institutions of Open Source Software: Examining the Debian Community. Information Economics and Policy. 2008;20(4):333–344. https://doi.org/10.1016/j.infoecopol.2008.06.001
Keywords: information system, website analysis, work programs of disciplines, higher education institutions, website parsing, document processing, IDEF0 diagrams, data visualization
For citation: Pocebneva I.V., Andreeva K.A., Pugovkina N.A., Gusev P.Y. Development of an information system for analyzing the content of websites with textual documents. Modeling, Optimization and Information Technology. 2025;13(4). URL: https://moitvivt.ru/ru/journal/pdf?id=2063 DOI: 10.26102/2310-6018/2025.51.4.043 (In Russ).
Received 02.09.2025
Revised 05.11.2025
Accepted 12.11.2025