By now, you should be aware of the framework of RL. In this recipe, we will implement a real-world application of the gridworld environment in RL. This problem can be represented as a grid that's 4x12 in size. The episodes start in the lower-left state, with a goal state at the bottom right of the grid. Going left, right, up, and down are the only possible actions at any state. The states labeled C in the lower part of the grid are cliffs. Any transition into these states will incur a high negative reward of -100 and send the agent instantly back to the starting state, S. For the goal state, G, the reward is 0, while it's -1 for all the transitions except the goal state and cliff.
The following image shows the navigation matrix for the cliff walking problem:
Let's proceed and solve this navigation problem using RL.