Chapter 3
- What's a stochastic policy?
- It's a policy defined in terms of a probability distribution
- How can a return be defined in terms of the return at the next time step?
- Why is the Bellman equation so important?
- Because it provides a general formula to compute the value of a state using the current reward and the value of the subsequent state.
- Which are the limiting factors of DP algorithms?
- Due to a complexity explosion with the number of states, they have to be limited. The other constraint is that the dynamics of the system have to be fully known.
- What's policy evaluation?
- Is an iterative method to compute the value function for a given policy using the Bellman equations.
- How does policy iteration and value iteration differs?
- Policy iteration alternate between policy evaluation and policy improvement, value iteration instead...