Keywords: classification, data stream, naive bayesian classifier, bayesian criteria
Stream data classification based on bayesian criteria
UDC 004.852
DOI: 10.26102/2310-6018/2020.28.1.034
The paper describes the issue of stream data classification. Stream data is described as a set of objects arriving from different sources at random moments of time. It might be a stream of data containing ocean coastal area sensors measure information and describing the parameters of the ecosystem condition, as well, it might be a stream of texts acquired from incoming emails attachments, etc. The Internet contains vast volumes of unstructured information. The lack of organization makes data inconvenient and resource-intensive to work with. Addressing to such an issue considered to be a relevant problem. Classification provides an opportunity to make it easier to work with unstructured information. The paper describes the algorithm for stream data classification based on Bayesian criteria. Text stream data model is proposed. This model allows applying natural language text classification algorithms to stream data. Naive Bayes classifier modification using tf-idf measure for evaluating the proximity of a classified document to a particular class that allows improving the classification quality is proposed. The classifier has been trained using the machine Fund of the Russian language. Software allowing text data stream extraction from the Internet and its classification using the proposed algorithm in real-time scale is proposed.
1. Lomakina L.S., Subbotin A.N., Surkova A.S. Naïve Bayes Modification for Data Streams Classification. Proceedings of the Thirteenth International MEDCOAST Congress on Coastal and Marine Sciences, Engineering, Management and Conservation (MEDCOAST 2017). 2017;2:805-814.
2. Bolshakova E.I, Klishinskii E.S., Lande D.V., Noskov A.A., Peskova O.V., Yagunova E.V. Automatic processing of natural language texts and computer linguistics: educational material. M.: MIEM. 2011 (In Russ).
3. Gaber М.М., Zaslavsky A., Krishnaswamy S. A Survey of Classification Methods in Data Streams. Data Streams. Ed. by Aggarwal С.C. Springer US. 2007.
4. Berry M.W., Kogan J. Text Mining. Applications and Theory. Wiley. 2010.
5. Lomakina L.S. Lomakin D.V., Subbotin A.N. Text streams Bayesian classification. Control systems and information technologies. 2016;4(66):60-64 (In Russ).
6. Subbotin A.N. Algorithm for natural language text information classification. Scientific and Technical Bulletin of the Volga Region. 2020;1:18-21(In Russ).
7. Lomakina L.S., Lomakin D.V., Subbotin A.N. Program for classifying text data streams based on the Bayesian approach. Certificate of state registration of a computer program № 2017611236, October 31th, 2016.
Keywords: classification, data stream, naive bayesian classifier, bayesian criteria
For citation: Lomakina L.S., Subbotin A.N. Stream data classification based on bayesian criteria. Modeling, Optimization and Information Technology. 2020;8(1). URL: https://moit.vivt.ru/wp-content/uploads/2020/02/LomakinaSubbotin_1_20_1.pdf DOI: 10.26102/2310-6018/2020.28.1.034 (In Russ).
Published 31.03.2020