In this chapter, we covered the details of a gridworld type of environment and understood the basics of the Markov decision process, that is, states, actions, rewards, transition model, and policy. Moreover, we utilized this information to calculate the utility and optimal policy through value iteration and policy iteration approaches.
Apart from this, we got a basic understanding of what partially observable Markov decision processes look like and the challenges in solving them. Finally, we took our favorite gridworld environment from OpenAI gym, that is, FrozenLake-v0 and implemented a value iteration approach to make our agent learn to navigate that environment.
In the next chapter, we will start with policy gradients and move beyond FrozenLake to some other fascinating and complex environments.