Exploring the challenges in multi-agent reinforcement learning
In the earlier chapters in this book, we discussed many challenges in reinforcement learning. In particular, the dynamic programming methods we initially introduced are not able to scale to problems with complex and large observation and action spaces. Deep reinforcement learning approaches, on the other hand, although capable of handling complex problems, lack theoretical guarantees and therefore required many tricks to stabilize and converge. Now that we talk about problems in which there are more than one agent learning, interacting with each other, and affecting the environment; the challenges and complexities of single-agent RL are multiplied. For this reason, many results in MARL are empirical.
In this section, we discuss what makes MARL specifically complex and challenging.
Non-stationarity
The mathematical framework behind single-agent RL is the Markov decision process (MDP), which establishes that the...