ПРИМЕНЕНИЕ БАЙЕСОВСКОГО КЛАССИФИКАТОРА ДЛЯ ОПРЕДЕЛЕНИЯ ТЕМАТИКИ ТЕКСТА
Работая с нашим сайтом, вы даете свое согласие на использование файлов cookie. Это необходимо для нормального функционирования сайта, показа целевой рекламы и анализа трафика. Статистика использования сайта отправляется в «Яндекс» и «Google»
Научный журнал Моделирование, оптимизация и информационные технологииThe scientific journal Modeling, Optimization and Information Technology
Online media
issn 2310-6018

APPLICATION OF THE BAYESOV CLASSIFIER FOR THE DEFINITION OF THE THEMATICS OF THE TEXT

Chupin P.G.,  Afonin A.Y.,  Shanov S.V. 

UDC 004.912
DOI:

  • Abstract
  • List of references
  • About authors

The relevance of the study is conditioned by the need of modern society in the automatic classification of data. In this paper, we consider a Bayesian algorithm for the case of determining the subject matter of a text. The purpose of the work is to develop, identify and solve problems arising during the implementation and work of the classifier, as well as to evaluate its effectiveness. Identified problems of arithmetic overflow and the appearance of zero probability as a result. Their solution is proposed by means of Laplace smoothing and the properties of logarithms. Approaches to optimizing and increasing the speed of the program module are also presented. As a result, a Bayesian classifier was implemented. His study was conducted on the basis of sets of articles of 10 different subjects. Based on the results of analytical and test verification. The materials of the article are of practical value for those who are going to apply the algorithm considered or to them in their research.

1. Text Mining. – Access mode: https://sites.google.com/site/upravlenieznaniami/tehnologii-upravleniaznaniami/text-mining-web-mining/text-mining Knowledge management – (Date of circulation: 04.02.2018).

2. S. Eprev Automatic classification of text documents. // Mathematical structures and modeling 2010, vol. 21, p.65 - 81

3. Naive Bayesian Classifier [Electronic resource]. – Access mode: http://bazhenov.me/blog/2012/06/11/naive-bayes – (Date of circulation: 04.02.2018).

4. A. Alekseev, A. S. Katasev, A. E. Kirillov, A. P. Kirpichnikov Classification of Text Documents Based on Text Minig // Bulletin of the Technological University. 2016. Vol. 19, No. 18 pages 116-119.

5. Морфологический анализатор pymorphy2 [Электронный ресурс]. – Режим доступа: https://pymorphy2.readthedocs.io/en/latest/ – (Дата обращения: 04.02.2018).

Chupin Pavel Georgievich

Email: pavelchupin94@yandex.ru

Penza State University

Penza, Russian Federation

Afonin Alexander Yurievich
Candidate of Technical Sciences
Email: afonin@pnzgu.ru

Penza state University

Penza, Russian Federation

Shanov Sergey Vladimirovich

Email: aesfur@gmail.com

Penza State University

Penza, Russian Federation

Keywords: naive bayesian classifier, text mining, algorithm, bayes theorem, document analysis

For citation: Chupin P.G., Afonin A.Y., Shanov S.V. APPLICATION OF THE BAYESOV CLASSIFIER FOR THE DEFINITION OF THE THEMATICS OF THE TEXT. Modeling, Optimization and Information Technology. 2018;6(1). URL: https://moit.vivt.ru/wp-content/uploads/2018/01/ShanovSoavtori_1_1_18.pdf DOI: (In Russ).

695

Full text in PDF

Published 31.03.2018