Solving the FrozenLake environment with dynamic programming
We will focus on the policy-based and value-based dynamic programming algorithms in this section. But let's start with simulating the FrozenLake environment.
Simulating the FrozenLake environment
FrozenLake is a typical OpenAI Gym environment with discrete states. It is about moving the agent from the starting tile to the destination tile in a grid, and at the same time avoiding traps. The grid is either 4 * 4 (https://gym.openai.com/envs/FrozenLake-v0/), or 8 * 8 (https://gym.openai.com/envs/FrozenLake8x8-v0/). There are four types of tiles in the grid:
- S: The starting tile. This is state 0, and it comes with 0 reward.
- G: The goal tile. It is state 15 in the 4 * 4 grid. It gives +1 reward and terminates an episode.
- F: The frozen tile. In the 4 * 4 grid, states 1, 2, 3, 4, 6, 8, 9, 10, 13, and 14 are walkable tiles. It gives 0 reward.
- H: The hole tile. In the 4 * 4 grid, states 5...