Policy Iteration
The algorithm of policy iteration is given as follows:
- Initialize a random policy
- Compute the value function using the given policy
- Extract a new policy using the value function obtained from step 2
- If the extracted policy is the same as the policy used in step 2 then stop, else send the extracted new policy to step 2 and repeat steps 2 to 4