Another approach to solving an MDP is by using a policy iteration algorithm, which we will discuss in this recipe.
A policy iteration algorithm can be subdivided into two components: policy evaluation and policy improvement. It starts with an arbitrary policy. And in each iteration, it first computes the policy values given the latest policy, based on the Bellman expectation equation; it then extracts an improved policy out of the resulting policy values, based on the Bellman optimality equation. It iteratively evaluates the policy and generates an improved version until the policy doesn't change any more.
Let's develop a policy iteration algorithm and use it to solve the FrozenLake environment. After that, we will explain how it works.