The question list is as follows:
- What is the Markov property?
- Why do we need the Markov Decision Process?
- When do we prefer immediate rewards?
- What is the use of the discount factor?
- Why do we use the Bellman function?
- How would you derive the Bellman equation for a Q function?
- How are the value function and Q function related?
- What is the difference between value iteration and policy iteration?