Setting up a Markov Decision Process
The Markov Decision Process (MDP) forms the basis of setting up RL, where the outcome of a decision is semi-controlled; that is, it is partly random and partly controlled (by the decision-maker). An MDP is defined using a set of possible states (S), a set of possible actions (A), a real-values reward function (R), and a set of transition probabilities from one state to another state for a given action (T). In addition, the effects of an action performed on one state depends only on that state and not on its previous states.
Getting ready
In this section, let us define an agent travelling across a 4 x 4 grid, as shown in following figure:
A sample 4 x 4 grid of 16 states
This grid has 16 states (S1, S2....S16). In each state, the agent can perform four actions (up, right, down, left). However, the agent will be restricted to some actions based on the following constraints:
- The states across the edges shall be restricted to actions which point only toward states...