Chapter 3: The Markov Decision Process and Dynamic Programming
- The Markov property states that the future depends only on the present and not on the past.
- MDP is an extension of the Markov chain. It provides a mathematical framework for modeling decision-making situations. Almost all RL problems can be modeled as MDP.
- Refer section Discount factor.
- The discount factor decides how much importance we give to the future rewards and immediate rewards.
- We use Bellman function for solving the MDP.
- Refer section Deriving the Bellman equation for value and Q functions.
- Value function specifies goodness of a state and Q function specifies goodness of an action in that state.
- Refer section Value iteration and Policy iteration.