The PTAN CartPole solver
Let's now take the PTAN classes (without Ignite so far) and try to combine everything together to solve our first environment: CartPole. The complete code is in Chapter07/06_cartpole.py
. I will show only the important parts of the code related to the material that we have just covered.
net = Net(obs_size, HIDDEN_SIZE, n_actions)
tgt_net = ptan.agent.TargetNet(net)
selector = ptan.actions.ArgmaxActionSelector()
selector = ptan.actions.EpsilonGreedyActionSelector(
epsilon=1, selector=selector)
agent = ptan.agent.DQNAgent(net, selector)
exp_source = ptan.experience.ExperienceSourceFirstLast(
env, agent, gamma=GAMMA)
buffer = ptan.experience.ExperienceReplayBuffer(
exp_source, buffer_size=REPLAY_SIZE)
In the beginning, we create the NN (the simple two-layer feed-forward NN that we used for CartPole before) and target the NN epsilon-greedy action selector and DQNAgent
. Then the experience source and replay...