DDPG training and results
To train the policy using our model, we will use deep deterministic policy gradients (DDPGs), which we covered in detail in Chapter 17, Continuous Action Space. I won't spend time here showing the code, which is in Chapter18/train_ddpg.py
and Chapter18/lib/ddpg.py
. For exploration, the Ornstein-Uhlenbeck process was used in the same way as for the Minitaur model.
The only thing I'd like to emphasize is the size of the model, in which the actor part was intentionally reduced to meet our hardware limitations. The actor has one hidden layer with 20 neurons, giving just two matrices (not counting the bias) of 28×20 and 20×4. The input dimensionality is 28, due to observation stacking, where four past observations are passed to the model. This dimensionality reduction leads to very fast training, which can be done without a GPU involved.
To train the model, you should run the train_ddpg.py
program, which accepts the following arguments...