Solving the FrozenLake environment with dynamic programming
We will focus on the policy-based and value-based dynamic programming algorithms in this section. But let’s start by simulating the FrozenLake environment. It simulates a simple grid-world scenario where an agent navigates through a grid of icy terrain, represented as a frozen lake, to reach a goal tile.
Simulating the FrozenLake environment
FrozenLake is a typical OpenAI Gym (now Gymnasium) environment with discrete states. It is about moving the agent from the starting tile to the destination tile in a grid, and at the same time avoiding traps. The grid is either 4 * 4 (FrozenLake-v1), or 8 * 8 (FrozenLake8x8-v1). There are four types of tiles in the grid:
- The starting tile: This is state 0, and it comes with 0 reward.
- The goal tile: It is state 15 in the 4 * 4 grid. It gives +1 reward and terminates an episode.
- The frozen tile: In the 4 * 4 grid, states 1, 2, 3, 4, 6, 8, 9, 10, 13...