The cross-entropy method on CartPole
The whole code for this example is in Chapter04/01_cartpole.py. Here, I will show only the most important parts. Our model’s core is a one-hidden-layer NN, with rectified linear unit (ReLU) and 128 hidden neurons (which is absolutely arbitrary; you can try to increase or decrease this constant – we’ve left this as an exercise for you). Other hyperparameters are also set almost randomly and aren’t tuned, as the method is robust and converges very quickly. We define constants at the top of the file:
import typing as tt
import torch
import torch.nn as nn
import torch.optim as optim
HIDDEN_SIZE = 128
BATCH_SIZE = 16
PERCENTILE = 70
As shown in the preceding code, the constants include the count of neurons in the hidden layer, the count of episodes we play on every iteration (16), and the percentile of each episode’s total rewards that we use for elite episode...