MDP is an extension of the Markov chain. It provides a mathematical framework for modeling decision-making situations. Almost all Reinforcement Learning problems can be modeled as MDP.
MDP is represented by five important elements:
- A set of states the agent can actually be in.
- A set of actions that can be performed by an agent, for moving from one state to another.
- A transition probability (), which is the probability of moving from one state to another state by performing some action .
- A reward probability (), which is the probability of a reward acquired by the agent for moving from one state to another state by performing some action .
- A discount factor (), which controls the importance of immediate and future rewards. We will discuss this in detail in the upcoming sections.