Chapter 3 – The Bellman Equation and Dynamic Programming
- The Bellman equation states that the value of a state can be obtained as a sum of the immediate reward and the discounted value of the next state. Similar to the Bellman equation of the value function, the Bellman equation of the Q function states that the Q value of a state-action pair can be obtained as a sum of the immediate reward and the discounted Q value of the next state-action pair.
- The Bellman expectation equation gives the Bellman value and Q functions whereas the Bellman optimality equation gives the optimal Bellman value and Q functions.
- The value function can be derived from the Q function as .
- The Q function can be derived from the value function as .
- In the value iteration method, we perform the following steps:
- Compute the optimal value function by taking maximum over Q function, that is,
- Extract the optimal policy from the computed...