Dynamic programming (DP) was the second major thread to influence modern reinforcement learning (RL) after trial-and-error learning. In this chapter, we will look at the foundations of DP and explore how they influenced the field of RL. We will also look at how the Bellman equation and the concept of optimality have interwoven with RL. From there, we will look at policy and value iteration methods to solve a class of problems well suited for DP. Finally, we will look at how to use the concepts we have learned in this chapter to teach an agent to play the FrozenLake environment from OpenAI Gym.
Here are the main topics we will cover in this chapter:
- Introducing DP
- Understanding the Bellman equation
- Building policy iteration
- Building value iteration
- Playing with policy versus value iteration
For this chapter, we look at how to solve...