Training both tigers and deer
The next example is the scenario when both tigers and deer are controlled by different DQN models being trained simultaneously. Tigers are rewarded for living longer, which means eating more deer, as at every step in the simulation they lose health points. Deer are also rewarded on every timestamp.
The code is in Chapter25/forest_both_dqn.py
and it is quite a simple extension of the previous example. For both groups of agents, we have a separate Agent
class instance, which communicates with the environment. As the observation for both groups is different, we have two separate networks, replay buffers, and experience sources. On every training step, we sample batches from both replay buffers and then train both networks independently.
I'm not going to put the code here, as it differs from the previous example only in small details. If you are curious, you can check GitHub examples. The following are plots with the convergence results.
...