In the previous recipe, we solved a relatively simple environment where we can easily obtain the optimal policy. In this recipe, let's simulate a more complex grid environment, Windy Gridworld, where an external force moves the agent from certain tiles. This will prepare us to search for the optimal policy using the TD method in the next recipe.
Windy Gridworld is a grid problem with a 7 * 10 board, which is displayed as follows:
An agent makes a move up, right, down, and left at a step. Tile 30 is the starting point for the agent, and tile 37 is the winning point where an episode will end if it is reached. Each step the agent takes incurs a -1 reward.
The complexity in this environment is that there is extra wind force in columns 4 to 9. Moving from tiles on those columns, the agent will experience an extra push upward...