The A2C baseline
To establish the baseline results, we will use the A2C method in a very similar way to the previous chapter. The complete source code is in the Chapter16/01_train_a2c.py and Chapter16/lib/model.py files. There are a few differences between this baseline and the version we used before:
-
16 parallel environments are used to gather experience during the training.
-
They differ in model structure and the way that we perform exploration.
Implementation
To illustrate the differences between this baseline and the previously discussed version, let’s look at the model and the agent classes.
The actor and critic are placed in separate networks without sharing weights. They follow the approach used in Chapter 15, with our critic estimating the mean and the variance for the actions. However, now, variance is not a separate head of the base network; it is just a single...