Hybrid agent training system using A2C and evolutionary strategies

Korchagin A.P.

UDC 004.85
DOI: 10.26102/2310-6018/2025.50.3.029

Abstract
List of references
About authors

The relevance of the study is due to the need to increase the efficiency of agent training under conditions of partial observability and limited interaction, which are typical for many real-world tasks in multiagent systems. In this regard, the present article is aimed at the development and analysis of a hybrid approach to agent training that combines the advantages of gradient-based and evolutionary methods. The main method of the study is a modified Advantage Actor-Critic (A2C) algorithm, supplemented with elements of evolutionary learning — crossover and mutation of neural network parameters. This approach allows for a comprehensive consideration of the problem of agent adaptation in conditions of limited observation and cooperative interaction. The article presents the results of experiments in an environment with two cooperative agents tasked with extracting and delivering resources. It is shown that the hybrid training method provides a significant increase in the effectiveness of agent behavior compared to purely gradient-based approaches. The dynamics of the average reward confirm the stability of the method and its potential for more complex multiagent interaction scenarios. The materials of the article have practical value for specialists in the fields of reinforcement learning, multi-agent system development, and the design of adaptive cooperative strategies under limited information.

1. Yadav A., Kumar A., Choudhary Ch. Integrated Swarm Intelligence Framework for Dynamic Traffic Optimization in Delhi: A Three-Layer PSO-Fuzzy-MAS Approach. International Scientific Journal of Engineering and Management. 2025;04(05). https://doi.org/10.55041/ISJEM03921

2. Icarte-Ahumada G., He Zh., Godoy V., García F., Oyarzún M. A Multi-Agent System for Parking Allocation: An Approach to Allocate Parking Spaces. Electronics. 2025;14(5). https://doi.org/10.3390/electronics14050840

3. Dey S., Munsi A., Pradhan S., Aditya K. Bidirectional Wireless System for Drone to Drone Opportunity Charging in a Multi Agent System. In: 2023 International Conference on Control, Communication and Computing (ICCC), 19–21 May 2023, Thiruvananthapuram, India. IEEE; 2023. P. 1–5. https://doi.org/10.1109/ICCC57789.2023.10164995

4. Souli N., Kolios P., Ellinas G. Multi-Agent System for Rogue Drone Interception. IEEE Robotics and Automation Letters. 2023;8(4):2221–2228. https://doi.org/10.1109/LRA.2023.3245412

5. Sanghi N. Deep Q-Learning (DQN). In: Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models. Berkeley: Apress; 2024. P. 225–271. https://doi.org/10.1007/979-8-8688-0273-7_6

6. Jeungthanasirigool W., Sirimaskasem Th., Boonraksa T., Boonraksa P. Comparison of PPO-DRL and A2C-DRL Algorithms for MPPT in Photovoltaic Systems via Buck-Boost Converter. International Journal of Innovative Research and Scientific Studies. 2025;8(3):2438–2453. https://doi.org/10.53894/ijirss.v8i3.7022

7. Вel Rio A., Jimenez D., Serrano J. Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General Environments. IEEE Access. 2024;12:146795–146806. https://doi.org/10.1109/ACCESS.2024.3472473

8. Chen T.-Yo., Chen W.-N., Hao J.-K., Wang Ya., Zhang J. Multi-Agent Evolution Strategy with Cooperative and Cumulative Step Adaptation for Black-Box Distributed Optimization. IEEE Transactions on Evolutionary Computation. 2025. https://doi.org/10.1109/TEVC.2025.3525713

9. Hochreiter S., Schmidhuber J. Long Short-Term Memory. Neural Computation. 1997;9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

10. Kingma D.P., Ba J. Adam: A Method for Stochastic Optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), 07–09 May 2015, San Diego, CA, USA. 2015. URL: https://arxiv.org/abs/1412.6980

Korchagin Aleksei Pavlovich

Voronezh State University

Voronezh, Russian Federation

Keywords: reinforcement learning, evolutionary algorithms, multiagent system, a2C, LSTM, cooperative learning

For citation: Korchagin A.P. Hybrid agent training system using A2C and evolutionary strategies. Modeling, Optimization and Information Technology. 2025;13(3). URL: https://moitvivt.ru/ru/journal/pdf?id=1991 DOI: 10.26102/2310-6018/2025.50.3.029 (In Russ).

346

Full text in PDF

Received 15.06.2025

Revised 18.07.2025

Accepted 30.07.2025

Published 30.09.2025