Experiment results
In this section, we'll take a look at the results of our multi-step training process
The baseline agent
To train the agent, run Chapter17/01_a2c.py
with the optional --cuda
flag to enable GPU and required -n
option with the experiment name used in TensorBoard and in a directory name to save models.
Chapter17$ ./01_a2c.py --cuda -n tt AtariA2C ( (conv): Sequential ( (0): Conv2d(2, 32, kernel_size=(8, 8), stride=(4, 4)) (1): ReLU () (2): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2)) (3): ReLU () (4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1)) (5): ReLU () ) (fc): Sequential ( (0): Linear (3136 -> 512) (1): ReLU () ) (policy): Linear (512 -> 4) (value): Linear (512 -> 1) ) 4: done 13 episodes, mean_reward=0.00, best_reward=0.00, speed=99.96 9: done 11 episodes, mean_reward=0.00, best_reward=0.00, speed=133.25 10: done 1 episodes, mean_reward=1.00, best_reward=1.00, speed=136.62 13: done 9 episodes, mean_reward...