References

moitvivt

Моделирование, оптимизация и информационные технологии

Modeling, Optimization and Information Technology

2310-6018

Издательство

10.26102/2310-6018/2020.28.1.04

732

Алгоритмизация мультиагентного обучения с подкреплением в теоретико-игровых задачах поиска оптимальных стратегий

Algorithmization of multi-agent learning with reinforcement in the game-theoretic problems of finding optimal strategies

Соколова

Елена Сергеевна

Sokolova

Elena Sergeevna

lenoks.sokolova@mail.ru aff-1

Разинкин

Константин Александрович

Razinkin

Konstantin Aleksandrovich

kostyr@mail.ru aff-2

ФГБОУ ВО «Воронежский государственный технический университет» Voronezh state technical University

ФГБОУ ВО «Воронежский государственный технический университет» Voronezh State Technical University

01 01 2026

1 1

10.26102/2310-6018/2020.28.1.04

2026

This work is licensed under a Creative Commons Attribution 4.0 International License

Актуальность темы статьи обусловлена растущим интересом к мультиагентному имитационному моделированию динамических систем различной физической и социальной природы. В настоящее время на первый план выходит концепция интеллектуального агента как имитационной модели поведения активного элемента в сложных ситуациях и стратегиях взаимодействия с другими активными элементами и средой для достижения цели. В общей концепции интеллектуального агента и агентных технологий имитации взаимодействия динамических объектов в направлении достижения цели предлагается метод структурнопараметрического моделирования интеллектуальных агентов и мультиагентных систем с алгоритмами идентификации и прогнозирования состояния агентов, а также программная реализация мультиагентных имитационных моделей производственных, социальных и маркетинговых систем. В этой связи актуальность темы обусловливается необходимостью повышения эффективности мультиагентного обучения с подкреплением в теоретико-игровых задачах поиска оптимальных стратегий. В статье описываются алгоритмы мультиагентного обучения с подкреплением в теоретико-игровых задачах, такие как minimax-Q, когда реализуется минимизации возможных потерь из тех, которые агенту нельзя предотвратить при развитии событий по наихудшему для него сценарию и WoLF-PHC (Win or Learn Fast – Policy Hill Climbing), реализующему политику быстрого выигрыша или быстрого обучения. Показаны достоинства и недостатки данных подходов, принципы их модернизации и возможности реализации указанных подходов в средах имитационного моделирования.

The relevance of the topic of the article is due to the growing interest in multi-agent simulation of dynamic systems of various physical and social nature. Currently, the concept of an intelligent agent as a simulation model of the behavior of the active element in complex situations and strategies for interaction with other active elements and the environment to achieve the goal is coming to the fore. In the general concept of an intelligent agent and agent technologies for simulating the interaction of dynamic objects in the direction of achieving a goal, a method of structural-parametric modeling of intelligent agents and multi-agent systems with algorithms for identifying and predicting the state of agents, as well as software for multi-agent simulation models of production, social and marketing systems. In this regard, the relevance of the topic is determined by the need to increase the effectiveness of multi-agent training with reinforcement in the game-theoretic problems of finding optimal strategies. The article describes multi-agent learning algorithms with reinforcement in game-theoretic problems, such as minimax-Q, when minimizing possible losses from those that cannot be prevented by an agent when events develop according to his worst-case scenario and WoLF-PHC (Win or Learn Fast – Policy Hill Climbing), which implements a policy of quick gain or quick training. In this case, the WoLF-PHC algorithm, which is a modification of the PHC algorithm. The algorithm has different learning speeds when winning an agent and a pro-game. Agent training rates vary to maintain algorithm convergence. The main idea of this algorithm is to learn quickly, losing, and slowly, winning. The advantages and disadvantages of these approaches, the principles of their modernization and the possibility of implementing these approaches in simulation environments are shown.

мультиагентное обучение обучение с подкреплением стохастические игры стратегии равновесия

multi-agent learning reinforcement learning stochastic games equilibrium strategies

Исследование выполнено без спонсорской поддержки.

The study was performed without external funding.

References 1

Littman M.L. Markov games as a framework for multi-agent reinforcement learning, in 11th International Conference on Machine Learning (New Brunswick, United States), July 1994:157-163.

Bowling M. and Veloso M. Multiagent learning using a variable learning rate. Artificial Intelligence. 2002;136(2):215-250.

Isaacs. Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. New York, New York: John Wiley and Sons, Inc. 1965.

Sutton R.S. and Barto A.G. Reinforcement learning: An Introduction. Cambridge, Massachusetts: The MIT Press, 1998.

Bowling M. Multiagent Learning in the Presence of Agents with Limitations. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, May 2003.

Соколова Е.С. Мультиагентный подход к моделированию межмодульных взаимодействий в стохастических сетевых распределённых системах. Системы управления и информационные технологии. 2020;1(79):67-71.

Ивашкин Ю.А. Мультиагентное моделирование в имитационной системе Simplex3. Учебное пособие. М.: Лаборатория знаний: Лаборатория Базовых Знаний. 2016:361.

Lu X. On Multi-Agent Reinforcement Learning in Games. Ph.D. Thesis Carleton University, Ottawa, ON, Canada. 2012.

Littman M.L., Szepesvári C. A generalized reinforcement learning model: Convergence and applications. Proceedings of the 13th International Conference on Machine Learning, (Bari, Italy). July 1996:310-318.

Hu J., Wellman M.P. Multiagent reinforcement learning: theoretical framework and an algorithm. Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), Madison, Wisconsin, USA, July 24-27. 1998:242-250.

. Hu J., M. P. Wellman M.P. Nash q-learning for general-sum stochastic games. Journal of Machine Learning Research. 2003;4:1039-1069.

Schwartz H.M. Multi-agent machine learning: a reinforcement approach. By John Wiley & Sons, Inc. 2014:315.

The authors declare that there are no conflicts of interest present.