Классификация потоковых данных на основе байесовского критерия
Работая с нашим сайтом, вы даете свое согласие на использование файлов cookie. Это необходимо для нормального функционирования сайта, показа целевой рекламы и анализа трафика. Статистика использования сайта отправляется в «Яндекс» и «Google»
Научный журнал Моделирование, оптимизация и информационные технологииThe scientific journal Modeling, Optimization and Information Technology
Online media
issn 2310-6018

Stream data classification based on bayesian criteria

Lomakina L.S.   Subbotin A.N.  

UDC 004.852
DOI: 10.26102/2310-6018/2020.28.1.034

  • Abstract
  • List of references
  • About authors

The paper describes the issue of stream data classification. Stream data is described as a set of objects arriving from different sources at random moments of time. It might be a stream of data containing ocean coastal area sensors measure information and describing the parameters of the ecosystem condition, as well, it might be a stream of texts acquired from incoming emails attachments, etc. The Internet contains vast volumes of unstructured information. The lack of organization makes data inconvenient and resource-intensive to work with. Addressing to such an issue considered to be a relevant problem. Classification provides an opportunity to make it easier to work with unstructured information. The paper describes the algorithm for stream data classification based on Bayesian criteria. Text stream data model is proposed. This model allows applying natural language text classification algorithms to stream data. Naive Bayes classifier modification using tf-idf measure for evaluating the proximity of a classified document to a particular class that allows improving the classification quality is proposed. The classifier has been trained using the machine Fund of the Russian language. Software allowing text data stream extraction from the Internet and its classification using the proposed algorithm in real-time scale is proposed.

1. Lomakina L.S., Subbotin A.N., Surkova A.S. Naïve Bayes Modification for Data Streams Classification. Proceedings of the Thirteenth International MEDCOAST Congress on Coastal and Marine Sciences, Engineering, Management and Conservation (MEDCOAST 2017). 2017;2:805-814.

2. Bolshakova E.I, Klishinskii E.S., Lande D.V., Noskov A.A., Peskova O.V., Yagunova E.V. Automatic processing of natural language texts and computer linguistics: educational material. M.: MIEM. 2011 (In Russ).

3. Gaber М.М., Zaslavsky A., Krishnaswamy S. A Survey of Classification Methods in Data Streams. Data Streams. Ed. by Aggarwal С.C. Springer US. 2007.

4. Berry M.W., Kogan J. Text Mining. Applications and Theory. Wiley. 2010.

5. Lomakina L.S. Lomakin D.V., Subbotin A.N. Text streams Bayesian classification. Control systems and information technologies. 2016;4(66):60-64 (In Russ).

6. Subbotin A.N. Algorithm for natural language text information classification. Scientific and Technical Bulletin of the Volga Region. 2020;1:18-21(In Russ).

7. Lomakina L.S., Lomakin D.V., Subbotin A.N. Program for classifying text data streams based on the Bayesian approach. Certificate of state registration of a computer program № 2017611236, October 31th, 2016.

Lomakina Lyubov Sergeevna
Doctor of Technical Sciences, Professor
Email: llomakina@list.ru

Nizhny Novgorod State University N. A. R.E. Alekseev

Nizhny Novgorod, Russian Federation

Subbotin Artem Nikolaevich

Email: turnonmore@yandex.ru

Nizhny Novgorod State University N. A. R.E. Alekseev
«СВТЕКНН», LLC

Nizhny Novgorod, Russian Federation

Keywords: classification, data stream, naive bayesian classifier, bayesian criteria

For citation: Lomakina L.S. Subbotin A.N. Stream data classification based on bayesian criteria. Modeling, Optimization and Information Technology. 2020;8(1). Available from: https://moit.vivt.ru/wp-content/uploads/2020/02/LomakinaSubbotin_1_20_1.pdf DOI: 10.26102/2310-6018/2020.28.1.034 (In Russ).

688

Full text in PDF