Solving the CartPole environment
The CartPole-v1 environment simulates a balancing act of a pole, hinged at its bottom to a cart, which moves left and right along a track. Balancing the pole upright is carried out by applying to the cart 1 unit of force – to the right or the left – at a time.
The pole, acting as a pendulum in this environment, starts upright within a small random angle, as shown in the following rendered output:
Figure 10.6: The CartPole simulation – the starting point
Our goal is to keep the pendulum from falling over to either side for as long as possible – that is, up to 500 time steps. For every time step that the pole remains upright, we get a reward of +1, so the maximum total reward is 500. The episode will end prematurely if one of the following occurs during the run:
- The angle of the pole from the vertical position exceeds 15 degrees
- The cart’s distance from the center exceeds...