A2C on Pong results
To start the training, run 02_pong_a2c.py
with the --cuda
and -n options (which provides a name of the run for TensorBoard):
rl_book_samples/Chapter10$ ./02_pong_a2c.py --cuda -n t2 AtariA2C ( (conv): Sequential ( (0): Conv2d(4, 32, kernel_size=(8, 8), stride=(4, 4)) (1): ReLU () (2): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2)) (3): ReLU () (4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1)) (5): ReLU () ) (policy): Sequential ( (0): Linear (3136 -> 512) (1): ReLU () (2): Linear (512 -> 6) ) (value): Sequential ( (0): Linear (3136 -> 512) (1): ReLU () (2): Linear (512 -> 1) ) ) 37799: done 1 games, mean reward -21.000, speed 722.89 f/s 39065: done 2 games, mean reward -21.000, speed 749.92 f/s 39076: done 3 games, mean reward -21.000, speed 755.26 f/s ...
As a word of warning: the training process is lengthy. With the original hyperparameters, it requires more than 8M frames to solve...