Tabular Learning and the Bellman Equation
In the previous chapter, you became acquainted with your first reinforcement learning (RL) algorithm, the cross-entropy method, along with its strengths and weaknesses. In this new part of the book, we will look at another group of methods that has much more flexibility and power: Q-learning. This chapter will establish the required background shared by those methods.
We will also revisit the FrozenLake environment and explore how new concepts fit with this environment and help us to address issues of its uncertainty.
In this chapter, we will:
- Review the value of the state and value of the action, and learn how to calculate them in simple cases
- Talk about the Bellman equation and how it establishes the optimal policy if we know the values
- Discuss the value iteration method and try it on the FrozenLake environment
- Do the same for the Q-learning method
Despite the simplicity of the environments in this chapter...