Bringing the action in: Markov decision process
A Markov reward process allowed us to model and study a Markov chain with rewards. Of course, our ultimate goal is to control such a system to achieve the maximum rewards. Now, we incorporate decisions into the MRP.
Definition
A Markov decision process (MDP) is simply a Markov reward process with decisions affecting transition probabilities and potentially the rewards.
Info
An MDP is characterized by a tuple , where we have a finite set of actions, , on top of MRP.
MDP is the mathematical framework behind RL. So, this is time to remember the RL diagram that we introduced in Chapter 1, Introduction to Reinforcement Learning:
Our goal in MDP is to find a policy that maximizes expected cumulative reward. A policy simply tells which action(s) to take for a given state. In other words, it is a mapping from states to actions. More formally, a policy...