Training both tigers and deer
The next example is the scenario when both tigers and deer are controlled by different DQN models being trained simultaneously. Tigers are rewarded for living longer, which stimulates them to eat more deer, as at every step in the simulation, they lose health points. Deer are also rewarded on every timestamp.
The code is in forest_both_dqn.py and it is an extension of the previous example. For both groups of agents, we have a separate DQNAgent class instance, which uses separate neural networks to convert observations into actions. The experience source is the same, but now we’re not filtering on a tiger’s group experience (with the parameter filter_group=None). Because of this, our replay buffer now contains observations from all the agents in the environment, not just from tigers as in the previous example. During the training, we sample a batch and split examples from deer and tigers into two separate batches to be used for training...