In this recipe, let's solve the CartPole environment using double DQNs. We will demonstrate how to fine-tune the hyperparameters in a double DQN to achieve the best performance.
In order to fine-tune the hyperparameters, we can apply the grid search technique to explore a set of different combinations of values and pick the one achieving the best average performance. We can start with a coarse range of values and continue to narrow it down gradually. And don’t forget to fix the random number generators for all of the following in order to ensure reproducibility:
- The Gym environment random number generator
- The epsilon-greedy random number generator
- The initial weights for the neural network in PyTorch