- What's an MDP?
- What's a stochastic policy?
- How can a return function be defined in terms of the return at the next time step?
- Why is the Bellman equation so important?
- What are the limiting factors of DP algorithms?
- What is policy evaluation?
- How do policy iteration and value iteration differ?