In the previous example, we saw an example of both cooperative and competitive self-play where multiple agents functioned almost symbiotically. While this was a great example, it still tied the functionality of one brain to another through their reward functions, hence our observation of the agents being in an almost rewards-opposite scenario. Instead, we now want to look at an environment that can train a brain with multiple agents using just adversarial self-play. Of course, ML-Agents has such an environment, called Banana, which comprises several agents that randomly wander the scene and collect bananas. The agents also have a laser pointer, which allows them to disable an opposing agent for several seconds if they are hit. This is the scene we will look at in the next exercise:
- Open the Banana scene from the Assets | ML-Agents | Examples | BananaCollectors...