Тематический анализ текстовой информации на основе частотных характеристик
Работая с нашим сайтом, вы даете свое согласие на использование файлов cookie. Это необходимо для нормального функционирования сайта, показа целевой рекламы и анализа трафика. Статистика использования сайта отправляется в «Яндекс» и «Google»
Научный журнал Моделирование, оптимизация и информационные технологииThe scientific journal Modeling, Optimization and Information Technology
Online media
issn 2310-6018

Thematic Analysis of Text Information Based on Frequency Characteristics

idPreobrazhenskiy A.P., Menyaylov D.V.,  Choporova E.I. 

UDC УДК 681.3
DOI: 10.26102/2310-6018/2021.32.1.025

  • Abstract
  • List of references
  • About authors

Currently, there is a development of methods related to the study of text arrays. In doing so, they aim to either measure their spatial characteristics, such as line lengths, font sizes, etc. or for consideration of general linguistic problems, in which the study of meaning-bearing units, such as sentences, phrases, etc., is carried out. In the second class of problems, the use of frequency analysis can be considered promising. The paper analyzes the approaches that can be used in this case. The authors in the article developed an algorithm for processing text in a natural language.The algorithm created in the work is programmatically implemented using Python, Jupyter Notebook, wordcloud, NLTK. During processing, the text array is split into words, after which a list of tokens is formed. Recommendations are given for removing conjunctions, prepositions and other parts of speech in order to carry out a full analysis of the topic. The main stages of the text frequency analysis algorithm are shown. They consist in the fact that the data are unloaded, the primary processing of text arrays is carried out, after which the process of replacing words is carried out, the statistical data are evaluated, unnecessary words are removed, and a visual presentation is carried out. The main stages of the algorithm have also been demonstrated based on fragments of the program code.

Keywords: text information, model, frequency analysis, program, word, language

For citation: Preobrazhenskiy A.P., Menyaylov D.V., Choporova E.I. Thematic Analysis of Text Information Based on Frequency Characteristics. Modeling, Optimization and Information Technology. 2021;9(1). URL: https://moitvivt.ru/ru/journal/pdf?id=944 DOI: 10.26102/2310-6018/2021.32.1.025 (In Russ).

827

Full text in PDF

Published 31.03.2021