The Bellman equation of optimality
To explain the Bellman equation, it's better to go a bit abstract. Don't be afraid; I'll provide concrete examples later to support your learning! Let's start with a deterministic case, when all our actions have a 100% guaranteed outcome. Imagine that our agent observes state s0 and has N available actions. Every action leads to another state, s1 ... sN, with a respective reward, r1 ... rN. Also, assume that we know the values, Vi, of all states connected to state s0. What will be the best course of action that the agent can take in such a state?
Figure 5.3: An abstract environment with N states reachable from the initial state
If we choose the concrete action, ai, and calculate the value given to this action, then the value will be . So, to choose the best possible action, the agent needs to calculate the resulting values for every action and choose the maximum possible outcome. In other words, . If we are using the...