Bellman worked on solving finite MDP with DP, and it was during these efforts he derived his famed equation. The beauty behind this equation—and more abstractly, the concept, in general—is that it describes a method of optimizing the value or quality of a state. In other words, it describes how we can determine the optimal value/quality for being in a given state given the action and choices of successive states. Before breaking down the equation itself, let's first reconsider the finite MDP in the next section.
Understanding the Bellman equation
Unraveling the finite MDP
Consider the finite MDP we developed in Chapter 1, Understanding Rewards Learning, that described your morning routine. Don't to...