Keywords: affective computing, emotion recognition, video data analysis, deepFace, GPT-4o language model, hybrid analysis system, semantic text analysis, multimodal interaction, neural network interpretability, cognitive technologies
Emotion analysis on video data using on-premise and cloud-based artificial intelligence solutions
UDC 004.89
DOI: 10.26102/2310-6018/2025.50.3.032
The relevance of the study is due to the growing need for a highly accurate and interpretable emotion recognition system based on video data, which is crucial for the development of human-centered technologies in education, medicine, and human–computer interaction systems. In this regard, the article aims to identify the differences and application prospects of the local DeepFace solution and the cloud-based GPT-4o (OpenAI) model for analyzing short video clips with emotional expressions. Methodologically, the study is based on empirical comparative analysis: a moving average method was used to smooth the time series of emotional assessments and to evaluate stability and cognitive interpretability. The results showed that DeepFace provides stable local processing and high resistance to artifacts, while GPT-4o demonstrates the ability for complex semantic interpretation and high sensitivity to context. The effectiveness of a hybrid approach combining computational autonomy and interpretative flexibility is substantiated. Thus, the synergy of local and cloud solutions opens up prospects for creating more accurate, adaptive, and scalable affective analysis systems. The materials of the article are of practical value to specialists in the fields of affective computing, interface design, and cognitive technologies.
1. Serengil S.I., Ozpinar A. LightFace: A Hybrid Deep Face Recognition Framework. In: 2020 Innovations in Intelligent Systems and Applications Conference (ASYU), 15–17 October 2020, Istanbul, Turkey. IEEE; 2020. P. 1–5. https://doi.org/10.1109/ASYU50717.2020.9259802
2. Razzaq M.A., Hussain J., Bang J., et al. A Hybrid Multimodal Emotion Recognition Framework for UX Evaluation Using Generalized Mixture Functions. Sensors. 2023;23(9). https://doi.org/10.3390/s23094373
3. Goryachkin B.S., Kitov M.A. Komp'yuternoe zrenie. E-Scio. 2020;(9):317–345. (In Russ.).
4. Zhao X., Wang L., Zhang Yu., Han X., Deveci M., Parmar M. A Review of Convolutional Neural Networks in Computer Vision. Artificial Intelligence Review. 2024;57(4). https://doi.org/10.1007/s10462-024-10721-6
5. Kalateh S., Estrada-Jimenez L.A., Nikghadam-Hojjati S., Barata J. A Systematic Review on Multimodal Emotion Recognition: Building Blocks, Current State, Applications, and Challenges. IEEE Access. 2024;12:103976–104019. https://doi.org/10.1109/ACCESS.2024.3430850
6. Poria S., Majumder N., Hazarika D., Cambria E., Gelbukh A., Hussain A. Multimodal Sentiment Analysis: Addressing Key Issues and Setting Up the Baselines. IEEE Intelligent Systems. 2018;33(6):17–25. https://doi.org/10.1109/MIS.2018.2882362
7. Mujiyanto M., Setyanto A., Utami E., Kusrini K. Facial Expression Recognition with Deep Learning and Attention Mechanisms: A Systematic Review. In: 2024 7th International Conference on Informatics and Computational Sciences (ICICoS), 17–18 July 2024, Semarang, Indonesia. IEEE; 2024. P. 12–17. https://doi.org/10.1109/ICICoS62600.2024.10636857
8. Timofeeva O.P., Neimushchev S.A., Neimushcheva L.I., Tikhonov I.A. Facial Emotion Recognition Using Deep Neural Networks. Trudy NGTU im. R.E. Alekseeva. 2020;(1):16–24. (In Russ.). https://doi.org/10.46960/1816-210X_2020_1_16
9. Pascual A.M., Valverde E.C., Kim J.-I., et al. Light-FER: A Lightweight Facial Emotion Recognition System on Edge Devices. Sensors. 2022;22(23). https://doi.org/10.3390/s22239524
10. Barabanschikov V.A., Suvorova E.V. Human Emotional State Assessment Based on a Video Portrayal. Experimental Psychology (Russia). 2020;13(4):4–24. (In Russ.). https://doi.org/10.17759/exppsy.2020130401
Keywords: affective computing, emotion recognition, video data analysis, deepFace, GPT-4o language model, hybrid analysis system, semantic text analysis, multimodal interaction, neural network interpretability, cognitive technologies
For citation: Agamirov L.V., Agamirov V.L., Vestyak V., Toutova N.V., Bazunov S., Zelyanik Y. Emotion analysis on video data using on-premise and cloud-based artificial intelligence solutions. Modeling, Optimization and Information Technology. 2025;13(3). URL: https://moitvivt.ru/ru/journal/pdf?id=1982 DOI: 10.26102/2310-6018/2025.50.3.032 (In Russ).
Received 02.06.2025
Revised 29.07.2025
Accepted 06.08.2025