We have already covered some very sophisticated examples and built some fairly intelligent agents. The techniques we have learned to use with RL and more specifically, PPO, are cutting edge, but as we learned, they still have their limitations. ML researchers continue to push the limits in multiple areas like network architecture and training setup. In the last chapter, we looked at one style of training multiple agents in multiple environments. In this chapter, we will explore the various novel training strategies we can employ with multiple agents and/or brains in an environment, from adversarial and cooperative self-play to imitation and curriculum learning. This will cover most of the remaining Unity examples, and the following is a summary of the main topics we will cover:
- Multi-agent environments
- Adversarial self-play
- Decisions and on-demand decision making...