We have now defined our environment and iterated over all possible actions and results from any given state to calculate the quality value of every move and stored these values in our Q object. At this point, we can now begin to tune the options for this model to see how it impacts performance.
If we recall, there are three parameters for reinforcement learning, and these are alpha, gamma, and epsilon. The following list describes the role of each parameter and the impact of adjusting their value:
- Alpha: The alpha rate for reinforcement learning is the same as the learning rate for many other machine learning models. It is the constant value used to control how quickly probabilities are updated as calculations are made based on exploring rewards for the agent taking certain actions.
- Gamma: Adjusting gamma adjusts how much the model values future rewards...