Алгоритмизация мультиагентного обучения с подкреплением в теоретико-игровых задачах поиска оптимальных стратегий
Работая с нашим сайтом, вы даете свое согласие на использование файлов cookie. Это необходимо для нормального функционирования сайта, показа целевой рекламы и анализа трафика. Статистика использования сайта отправляется в «Яндекс» и «Google»
Научный журнал Моделирование, оптимизация и информационные технологииThe scientific journal Modeling, Optimization and Information Technology
Online media
issn 2310-6018

Algorithmization of multi-agent learning with reinforcement in the game-theoretic problems of finding optimal strategies

Sokolova E.S.   Razinkin K.A.  

UDC 004.8, 519.83
DOI: 10.26102/2310-6018/2020.28.1.04

  • Abstract
  • List of references
  • About authors

The relevance of the topic of the article is due to the growing interest in multi-agent simulation of dynamic systems of various physical and social nature. Currently, the concept of an intelligent agent as a simulation model of the behavior of the active element in complex situations and strategies for interaction with other active elements and the environment to achieve the goal is coming to the fore. In the general concept of an intelligent agent and agent technologies for simulating the interaction of dynamic objects in the direction of achieving a goal, a method of structural-parametric modeling of intelligent agents and multi-agent systems with algorithms for identifying and predicting the state of agents, as well as software for multi-agent simulation models of production, social and marketing systems. In this regard, the relevance of the topic is determined by the need to increase the effectiveness of multi-agent training with reinforcement in the game-theoretic problems of finding optimal strategies. The article describes multi-agent learning algorithms with reinforcement in game-theoretic problems, such as minimax-Q, when minimizing possible losses from those that cannot be prevented by an agent when events develop according to his worst-case scenario and WoLF-PHC (Win or Learn Fast – Policy Hill Climbing), which implements a policy of quick gain or quick training. In this case, the WoLF-PHC algorithm, which is a modification of the PHC algorithm. The algorithm has different learning speeds when winning an agent and a pro-game. Agent training rates vary to maintain algorithm convergence. The main idea of this algorithm is to learn quickly, losing, and slowly, winning. The advantages and disadvantages of these approaches, the principles of their modernization and the possibility of implementing these approaches in simulation environments are shown.

1. Littman M.L. Markov games as a framework for multi-agent reinforcement learning, in 11th International Conference on Machine Learning (New Brunswick, United States), July 1994:157-163.

2. Bowling M. and Veloso M. Multiagent learning using a variable learning rate. Artificial Intelligence. 2002;136(2):215-250.

3. Isaacs. Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. New York, New York: John Wiley and Sons, Inc. 1965.

4. Sutton R.S. and Barto A.G. Reinforcement learning: An Introduction. Cambridge, Massachusetts: The MIT Press, 1998.

5. Bowling M. Multiagent Learning in the Presence of Agents with Limitations. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, May 2003.

6. Sokolova E.S. A multi-agent approach to modeling intermodular interactions in stochastic distributed network systems. Management Systems and Information Technology. 2020;1(79):67-71. (In Russ.).

7. Ivashkin Yu.A. Multi-agent modeling in the Simplex3 simulation system. Tutorial. M .: Laboratory of knowledge: Laboratory of Basic Knowledge. 2016:361. (In Russ.).

8. Lu X. On Multi-Agent Reinforcement Learning in Games. Ph.D. Thesis Carleton University, Ottawa, ON, Canada. 2012.

9. Littman M.L., Szepesvári C. A generalized reinforcement learning model: Convergence and applications. Proceedings of the 13th International Conference on Machine Learning, (Bari, Italy). July 1996:310-318.

10. Hu J., Wellman M.P. Multiagent reinforcement learning: theoretical framework and an algorithm. Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), Madison, Wisconsin, USA, July 24-27. 1998:242-250.

11. . Hu J., M. P. Wellman M.P. Nash q-learning for general-sum stochastic games. Journal of Machine Learning Research. 2003;4:1039-1069.

12. Schwartz H.M. Multi-agent machine learning: a reinforcement approach. By John Wiley & Sons, Inc. 2014:315.

Sokolova Elena Sergeevna

Email: lenoks.sokolova@mail.ru

Voronezh state technical University

Voronezh, Russian Federation

Razinkin Konstantin Aleksandrovich
Doctor of Technical Sciences, Professor
Email: kostyr@mail.ru

Voronezh State Technical University

Voronezh, Russian Federation

Keywords: multi-agent learning, reinforcement learning, stochastic games, equilibrium strategies

For citation: Sokolova E.S. Razinkin K.A. Algorithmization of multi-agent learning with reinforcement in the game-theoretic problems of finding optimal strategies. Modeling, Optimization and Information Technology. 2020;8(1). Available from: https://moit.vivt.ru/wp-content/uploads/2020/02/SokolovaSoavtori_1_20_1.pdf DOI: 10.26102/2310-6018/2020.28.1.04 (In Russ).

1098

Full text in PDF