In this recipe, let's solve a more complicated Cliff Walking environment using the A2C algorithm.
Cliff Walking is a typical Gym environment with long episodes without a guarantee of termination. It is a grid problem with a 4 * 12 board. An agent makes a move of up, right, down and left at a step. The bottom-left tile is the starting point for the agent, and the bottom-right is the winning point where an episode will end if it is reached. The remaining tiles in the last row are cliffs where the agent will be reset to the starting position after stepping on any of them, but the episode continues. Each step the agent takes incurs a -1 reward, with the exception of stepping on the cliffs, where a -100 reward incurs.
The state is an integer from 0 to 47, indicating where the agent is located, as illustrated:
Such value does...