A2C on Pong results
To start the training, run 02_pong_a2c.py
with the --cuda
and -n
options (which provides a name for the run for TensorBoard):
rl_book_samples/Chapter10$ ./02_pong_a2c.py --cuda -n t2
AtariA2C (
(conv): Sequential (
(0): Conv2d(4, 32, kernel_size=(8, 8), stride=(4, 4))
(1): ReLU ()
(2): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2))
(3): ReLU ()
(4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
(5): ReLU ()
)
(policy): Sequential (
(0): Linear (3136 -> 512)
(1): ReLU ()
(2): Linear (512 -> 6)
)
(value): Sequential (
(0): Linear (3136 -> 512)
(1): ReLU ()
(2): Linear (512 -> 1)
)
)
37799: done 1 games, mean reward -21.000, speed 722.89 f/s
39065: done 2 games, mean reward -21.000, speed 749.92 f/s
39076: done 3 games, mean...