Markov Decision Process
MDP is an extension of the Markov chain. It provides a mathematical framework for modeling decision-making situations. Almost all Reinforcement Learning problems can be modeled as MDP.
MDP is represented by five important elements:Â
- A set of states  the agent can actually be in.
- A set of actions that can be performed by an agent, for moving from one state to another.
- A transition probability (), which is the probability of moving from one state  to another state by performing some action .
- A reward probability (), which is the probability of a reward acquired by the agent for moving from one state to another state  by performing some action .
- A discount factor (), which controls the importance of immediate and future rewards. We will discuss this in detail in the upcoming sections.
Rewards and returns
As we have learned, in an RL environment, an agent interacts with the environment by performing an action and moves from one state to another. Based on the action it performs...