Deep Q-network for tigers
In the previous example, both groups of agents were behaving randomly, which is not very interesting. Now we will apply the deep Q-network (DQN) model to the tiger group of agents to check whether they can learn some interesting policy. All of the agents share the network, so their behavior will be the same.
The training code is in Chapter25/forest_tigers_dqn.py
, and it doesn't differ much from the other DQN versions from the previous chapters. To make the MAgent environment work with our classes, gym.Env
wrapper was implemented in Chapter25/lib/data.py
in class MAgentEnv
. Let's check it to understand how it fits into the rest of the code.
class MAgentEnv(VectorEnv):
def __init__(self, env: magent.GridWorld, handle,
reset_env_func: Callable[[], None],
is_slave: bool = False,
steps_limit: Optional[int] = None):
reset_env_func()
action_space = self.handle_action_space(env...