Tabular Learning and the Bellman Equation
In the previous chapter, you became acquainted with your first reinforcement learning (RL) algorithm, the cross-entropy method, along with its strengths and weaknesses. In this new part of the book, we will look at another group of methods that has much more flexibility and power: Q-learning. This chapter will establish the required background shared by those methods.
We will also revisit the FrozenLake environment and explore how new concepts fit with this environment and help us to address issues related to its uncertainty.
In this chapter, we will:
-
Review the value of the state and the value of the action, and learn how to calculate them in simple cases
-
Talk about the Bellman equation and how it establishes the optimal policy if we know the values of states
-
Discuss the value iteration method and try it on the FrozenLake environment
- ...