Recall our discussion of the three major hyperparameters of a Q-learning model:
- Alpha: The learning rate
- Gamma: The discount rate
- Epsilon: The exploration rate
What values should we choose for these hyperparameters to optimize the performance of our taxi agent? We will discover these values through experimentation once we have constructed our game environment, and we can also take advantage of existing research on the taxi problem and set the variables to known optimal values.
A large part of our model-tuning and optimization phase will consist of comparing the performance of different combinations of these three hyperparamenters together.
One option that we have is the ability to decay any, or all, of these hyperparameters – in other words, to reduce their values as we progress through a game...