Keywords: naive bayesian classifier, text mining, algorithm, bayes theorem, document analysis
APPLICATION OF THE BAYESOV CLASSIFIER FOR THE DEFINITION OF THE THEMATICS OF THE TEXT
UDC 004.912
DOI:
The relevance of the study is conditioned by the need of modern society in the automatic classification of data. In this paper, we consider a Bayesian algorithm for the case of determining the subject matter of a text. The purpose of the work is to develop, identify and solve problems arising during the implementation and work of the classifier, as well as to evaluate its effectiveness. Identified problems of arithmetic overflow and the appearance of zero probability as a result. Their solution is proposed by means of Laplace smoothing and the properties of logarithms. Approaches to optimizing and increasing the speed of the program module are also presented. As a result, a Bayesian classifier was implemented. His study was conducted on the basis of sets of articles of 10 different subjects. Based on the results of analytical and test verification. The materials of the article are of practical value for those who are going to apply the algorithm considered or to them in their research.
1. Text Mining. – Access mode: https://sites.google.com/site/upravlenieznaniami/tehnologii-upravleniaznaniami/text-mining-web-mining/text-mining Knowledge management – (Date of circulation: 04.02.2018).
2. S. Eprev Automatic classification of text documents. // Mathematical structures and modeling 2010, vol. 21, p.65 - 81
3. Naive Bayesian Classifier [Electronic resource]. – Access mode: http://bazhenov.me/blog/2012/06/11/naive-bayes – (Date of circulation: 04.02.2018).
4. A. Alekseev, A. S. Katasev, A. E. Kirillov, A. P. Kirpichnikov Classification of Text Documents Based on Text Minig // Bulletin of the Technological University. 2016. Vol. 19, No. 18 pages 116-119.
5. Морфологический анализатор pymorphy2 [Электронный ресурс]. – Режим доступа: https://pymorphy2.readthedocs.io/en/latest/ – (Дата обращения: 04.02.2018).
Keywords: naive bayesian classifier, text mining, algorithm, bayes theorem, document analysis
For citation: Chupin P.G., Afonin A.Y., Shanov S.V. APPLICATION OF THE BAYESOV CLASSIFIER FOR THE DEFINITION OF THE THEMATICS OF THE TEXT. Modeling, Optimization and Information Technology. 2018;6(1). URL: https://moit.vivt.ru/wp-content/uploads/2018/01/ShanovSoavtori_1_1_18.pdf DOI: (In Russ).
Published 31.03.2018