The battle environment
Besides the tiger-deer environment, MAgent contains several other predefined configurations you can find in the magent2.builtin.config and magent2.environment packages. As a final example in this chapter, we’ll take a look at the “battle” configuration, where two groups of agents are fighting each other (without eating, thank goodness). Both agents have health points of 10 and every attack takes 2 health points, so 5 consecutive attacks are required to get the reward for the agent.
You can find the code in battel_dqn.py. In this setup, one group is behaving randomly and another is using the DQN to improve the policy. Training took two hours and the DQN was able to find a decent policy, but at the end, the training process diverged. In Figure 22.9, the training and test reward plots are shown:
Figure 22.9: Average reward during training (left) and test (right) in the battle scenario