PG on CartPole
Nowadays, almost nobody uses the vanilla PG method, as the much more stable Actor-Critic method exists, which will be the topic of the two following chapters. However, I still want to show the PG implementation, as it establishes very important concepts and metrics to check for the PG method’s performance. So, we will start with a much simpler environment of CartPole, and in the next section, will check its performance on our favorite Pong environment. The complete code for the following example is available in Chapter09/04_cartpole_pg.py
.
GAMMA = 0.99 LEARNING_RATE = 0.001 ENTROPY_BETA = 0.01 BATCH_SIZE = 8 REWARD_STEPS = 10
Besides already familiar hyperparameters, we have two new ones. Entropy beta value is the scale of the entropy bonus. The REWARD_STEPS
value specifies how many steps ahead the Bellman equation is unrolled to estimate the discounted total reward of every transition.
class PGN(nn.Module): def __init__(self, input_size, n_actions): ...