This has been an exciting chapter and we have been able to play with several variations of training scenarios. We started by looking at extending our training to multi-agent environments that still used a single brain. Next, we looked at a variation of multi-agent training called Adversarial self-play, that allows us to train pairs of agents using a system of inverse rewards. Then, we covered how an agent can be configured to make decisions at a specific frequency or even on demand. After that, we looked at another novel method of training called Imitation Learning. This training scenario allowed us to play and, at the same time, teach an agent to play tennis. Finally, we completed the chapter with another training technique called Curriculum Learning, which allowed us to gradually increase the complexity of an agent's training over time.
In this chapter, we played...