Тематический анализ текстовой информации на основе частотных характеристик
Работая с нашим сайтом, вы даете свое согласие на использование файлов cookie. Это необходимо для нормального функционирования сайта, показа целевой рекламы и анализа трафика. Статистика использования сайта отправляется в «Яндекс» и «Google»
Научный журнал Моделирование, оптимизация и информационные технологииThe scientific journal Modeling, Optimization and Information Technology
Online media
issn 2310-6018

Thematic Analysis of Text Information Based on Frequency Characteristics

idPreobrazhenskiy A.P. Menyaylov D.V.   Choporova E.I.  

UDC УДК 681.3
DOI: 10.26102/2310-6018/2021.32.1.025

  • Abstract
  • List of references
  • About authors

Currently, there is a development of methods related to the study of text arrays. In doing so, they aim to either measure their spatial characteristics, such as line lengths, font sizes, etc. or for consideration of general linguistic problems, in which the study of meaning-bearing units, such as sentences, phrases, etc., is carried out. In the second class of problems, the use of frequency analysis can be considered promising. The paper analyzes the approaches that can be used in this case. The authors in the article developed an algorithm for processing text in a natural language.The algorithm created in the work is programmatically implemented using Python, Jupyter Notebook, wordcloud, NLTK. During processing, the text array is split into words, after which a list of tokens is formed. Recommendations are given for removing conjunctions, prepositions and other parts of speech in order to carry out a full analysis of the topic. The main stages of the text frequency analysis algorithm are shown. They consist in the fact that the data are unloaded, the primary processing of text arrays is carried out, after which the process of replacing words is carried out, the statistical data are evaluated, unnecessary words are removed, and a visual presentation is carried out. The main stages of the algorithm have also been demonstrated based on fragments of the program code.

1. Sviridov V. I., Choporova E. I., Sviridova E.V. Linguistic support of automated control systems and user-computer interaction Modeling, optimization and information technology. 2019;1(24):430-438.

2. Tsepkovskaya T.A., Choporova E.I. Problems of building automated training systems Modeling, optimization and information technology. 2017;1(16):20.

3. Osochkin A.A., Fomin V.V., Flegontov A.V. Method of frequency-morphological classification of texts. Software products and systems. 2017;3(30):478–486.

4. Smirnova I.G., Choporova E.I., Serostanova N.N. Features of the development of specialized teaching aids in a foreign language, taking into account the formation of information and communication competence of students. Vestnik Voronezhskogo institute vysokih tekhnologij. 2017;3(22):64-68.

5. Shemenkov P.S. Neural network method of knowledge extraction based on the co-occurrence of key terms. Proceedings of 61st scientific and technical conference of the teaching staff, SPb GUT. 2009:42–43.

6. Tretyakov F.I., Serebryanaya L.V. Methods for automatic construction of abstracts based on the frequency analysis of texts. Reports of the Belarusian State University of Informatics and Radioelectronics. 2014;3(81):40–44.

7. Shumilina T.V. Application of frequency analysis of media texts to optimize the communication process. Vestnik Moskovskogo Universiteta. Ser. 10. Journalism. 2017;(2):67–79.

8. Than B.H., Lupin S.A., Taik A.M., Tun H. Static load balancing in parallel implementation of the algorithm for frequency analysis of text information. International Journal of Open Information Technologies. 2016;4(11):27-33.

9. Lyashevskaya O.N., Sharov S.A. Frequency dictionary of the modern Russian language (on the materials of the National corpus of the Russian language). M .: Azbukovnik, 2009.

Preobrazhenskiy Andrey Petrovich
doctor of technical sciencies, assistant professor

WoS | Scopus | ORCID | eLibrary |

Voronezh Institute of High Technologies

Voronezh, Russian Federation

Menyaylov Dmitriy Vladimirovich

Voronezh Institute of High Technologies

Voronezh, Russian Federation

Choporova Ejkaterina Ivanovna
candidate of pedagogical sciencies, assistant professor

eLibrary |

Voronezh Institute of High Technologies

Voronezh, Russian Federation

Keywords: text information, model, frequency analysis, program, word, language

For citation: Preobrazhenskiy A.P. Menyaylov D.V. Choporova E.I. Thematic Analysis of Text Information Based on Frequency Characteristics. Modeling, Optimization and Information Technology. 2021;9(1). Available from: https://moitvivt.ru/ru/journal/pdf?id=944 DOI: 10.26102/2310-6018/2021.32.1.025 (In Russ).

481

Full text in PDF