Keywords: intelligent document processing, computer vision, convolutional neural networks, stream data processing, machine learning
Development of a lightweight model for automatic classification of structured and unstructured data in streaming sources to optimize optical character recognition
UDC 004.93
DOI: 10.26102/2310-6018/2025.50.3.006
This article discusses the task of preliminary assessment of incoming electronic document management based on computer vision technologies. The authors synthesized a dataset of images with structured data based on the invoice form and also collected scans of various documents from pages of scientific articles and documentation in the electronic mailbox of a scientific organization to Rosstat reports. Thus, the first part of the dataset refers to structured data with a strict form, and the second part refers to unstructured scans, since information can be presented in different ways on different scanned documents: only text, text and images, graphs, since different sources have different requirements and their own standards. The primary analysis of data in streaming sources can be done using computer vision models. The experiments performed have shown high accuracy of convolutional neural networks. In particular, for a neural network with the Xception architecture, the result is achieved with an accuracy of more than 99%. The advantage over the simpler MobileNetV2 model is about 9%. The proposed approach will allow for the primary filtering of documents by department without using large language and character recognition models, which will increase speed and reduce computational costs.
1. Shi B., Bai X., Yao C. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2017;39(11):2298–2304. https://doi.org/10.1109/TPAMI.2016.2646371
2. Inoue K. Context-Independent OCR with Multimodal LLMs: Effects of Image Resolution and Visual Complexity. arXiv. URL: https://arxiv.org/abs/2503.23667 [Accessed 12th March 2025].
3. Fujitake M. DTrOCR: Decoder-Only Transformer for Optical Character Recognition. In: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 03–08 January 2024, Waikoloa, HI, USA. IEEE; 2024. P. 8010–8020. https://doi.org/10.1109/WACV57701.2024.00784
4. Tian Yu., Ye Q., Doermann D. YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv. URL: https://arxiv.org/abs/2502.12524 [Accessed 12th March 2025].
5. Alif M.A.R., Hussain M. YOLOv12: A Breakdown of the Key Architectural Features. arXiv. URL: https://arxiv.org/abs/2502.14740 [Accessed 12th March 2025].
6. Wang X., Li Ye., Liu J., et al. Intelligent Micron Optical Character Recognition of DFB Chip Using Deep Convolutional Neural Network. IEEE Transactions on Instrumentation and Measurement. 2022;71. https://doi.org/10.1109/TIM.2022.3154831
7. Li M., Lv T., Chen J., et al. TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models. In: Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, 07–14 February 2023, Washington, DC, USA. AAAI Press; 2023. P. 13094–13102.
8. Dosovitskiy A., Beyer L., Kolesnikov A., et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In: 9th International Conference on Learning Representations, ICLR 2021, 03–07 May 2021, Virtual Event, Austria. 2021. https://doi.org/10.48550/arXiv.2010.11929
9. Xiong S., Chen X., Zhang H. Deep Learning-Based Multifunctional End-to-End Model for Optical Character Classification and Denoising. Journal of Computational Methods in Engineering Applications. 2023;3(1):1–13. https://doi.org/10.62836/jcmea.v3i1.030103
10. Kasem M.S.E., Mahmoud M., Kang H.-S. Advancements and Challenges in Arabic Optical Character Recognition: A Comprehensive Survey. [Preprint]. arXiv. URL: https://arxiv.org/abs/2312.11812 [Accessed 14th March 2025].
11. Baek Yo., Lee B., Han D., Yun S., Lee H. Character Region Awareness for Text Detection. arXiv. URL: https://doi.org/10.48550/arXiv.1904.01941 [Accessed 14th March 2025].
12. Zhang Ya., Ye Yu-L., Guo D.-J., Huang T. PCA-VGG16 Model for Classification of Rock Types. Earth Science Informatics. 2024;17(2):1553–1567. https://doi.org/10.1007/s12145-023-01217-y
13. Sarwinda D., Paradisa R.H., Bustamam A., Anggia P. Deep Learning in Image Classification Using Residual Network (ResNet) Variants for Detection of Colorectal Cancer. Procedia Computer Science. 2021;179:423–431. https://doi.org/10.1016/j.procs.2021.01.025
14. Morani K., Ayana E.K., Kollias D., Unay D. COVID‐19 Detection from Computed Tomography Images Using Slice Processing Techniques and a Modified Xception Classifier. International Journal of Biomedical Imaging. 2024;2024. https://doi.org/10.1155/2024/9962839
15. Andriyanov N., Andriyanov D. Pattern Recognition on Radar Images Using Augmentation. In: 2020 Ural Symposium on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT), 14–15 May 2020, Yekaterinburg, Russia. IEEE; 2020. P. 0289–0291. https://doi.org/10.1109/USBEREIT48449.2020.9117669
16. Sandler M., Howard A., Zhu M., Zhmoginov A., Chen L.-Ch. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18–23 June 2018, Salt Lake City, UT, USA. IEEE; 2018. P. 4510–4520. https://doi.org/10.1109/CVPR.2018.00474
17. Tan M., Le Q.V. EfficientNetV2: Smaller Models and Faster Training. In: Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18–24 July 2021, Virtual Event. 2021. P. 10096–10106.
18. Ahmad A., Andriyanov N.A., Soloviev V.I., Solomatin D.A. Application of Deep Learning for Augmentation and Generation of an Underwater Data Set with Industrial Facilities. Bulletin of the South Ural State University. Series: Computer Technologies, Automatic Control, Radioelectronics. 2023;23(2):5–16. (In Russ.). https://doi.org/10.14529/ctcr230201
Keywords: intelligent document processing, computer vision, convolutional neural networks, stream data processing, machine learning
For citation: Gavrilov V.S., Korchagin S.A., Dolgov V.I., Andriyanov N.A. Development of a lightweight model for automatic classification of structured and unstructured data in streaming sources to optimize optical character recognition. Modeling, Optimization and Information Technology. 2025;13(3). URL: https://moitvivt.ru/ru/journal/pdf?id=1918 DOI: 10.26102/2310-6018/2025.50.3.006 (In Russ).
Received 26.04.2025
Revised 15.06.2025
Accepted 24.06.2025