Training policies in multi-agent settings
There are many algorithms and approaches designed for MARL, which can be classified in the following two broad categories.
- Independent learning: This approach suggests training agents individually while treating the other agents in the environment as part of the environment.
- Centralized training and decentralized execution: In this approach, there is a centralized controller that uses information from multiple agents during training. At the time of execution (inference), the agents locally execute the policies, without relying on a central mechanism.
Generally speaking, we can take any of the algorithms we covered in one of the previous chapters and use it in a multi-agent setting to train policies via independent learning, which, as it turns out, is a very competitive alternative to specialized MARL algorithms. So rather than dumping more theory and notation on you, in this chapter, we will skip discussing the technical...