Experiment results
In this section, we will take a look at the results of our multistep training process.
The baseline agent
To train the agent, run Chapter22/01_a2c.py
with the optional --cuda
flag to enable the graphics processing unit (GPU) and the required -n
option with the experiment name used in TensorBoard and in a directory name to save the models.
Chapter22$ ./01_a2c.py --cuda -n tt
AtariA2C(
(conv): Sequential(
(0): Conv2d(2, 32, kernel_size=(8, 8), stride=(4, 4))
(1): ReLU()
(2): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2))
(3): ReLU()
(4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
(5): ReLU()
)
(fc): Sequential(
(0): Linear(in_features=3136, out_features=512, bias=True)
(1): ReLU()
)
(policy): Linear(in_features=512, out_features=4, bias=True)
(value): Linear(in_features=512, out_features=1, bias=True)
)
4: done 13 episodes, mean_reward=0.00, best_reward=0.00, speed=696.72
9: done 12 episodes, mean_reward...