The Bellman equation of optimality
To explain the Bellman equation, it’s better to go a bit abstract. Don’t be afraid; I’ll provide concrete examples later to support your learning! Let’s start with a deterministic case, when all our actions have a 100% guaranteed outcome. Imagine that our agent observes state s0 and has N available actions. Every action leads to another state, s1…sN, with a respective reward, r1…rN. Also, assume that we know the values, V i, of all states connected to state s0. What will be the best course of action that the agent can take in such a state?
Figure 5.3: An abstract environment with N states reachable from the initial state
If we choose the concrete action, ai, and calculate the value given to this action, then the value will be V 0(a = ai) = ri + V i. So, to choose the best possible action, the agent needs to calculate the resulting values for every action and...