Markov decision processes
Let's complete the definition of the reinforcement learning problem by learning about a mathematical framework called the Markov decision process (MDP).
An MDP definition has five things:
- A finite set of states
- A finite set of actions
- A finite set of rewards
- A discount rate
- The one-step dynamics of the environment
We have learned about how to specify the states, actions, rewards, and discount rates. Let's find out how to specify the one-step dynamics of the environment.
The image that follows describes an MDP for a trash-collecting robot. The goal of the robot is to collect trash cans. The robot will go in search of the trash cans and keep collecting them until the battery runs out and then come back to the docking station to recharge the battery. The states of the robot can be defined as high and low, representing its battery level. The set of actions the robot can take are searching for the trash cans, waiting at its own position, and going...