The CartPole-v1 environment simulates a balancing act of a pole, hinged at its bottom to a cart, which moves left and right along a track. Balancing the pole upright is carried out by applying to the cart one unit of force—to the right or to the left—at a time.
The pole, acting as a pendulum in this environment, starts upright within a small random angle, as shown in the following rendered output:
CartPole simulation—starting point
Our goal is to keep the pendulum from falling over to either side for as long as possible, that is, up to 500 time steps. For every time step that the pole remains upright, we get a reward of +1, so the maximum total reward is 500. The episode will end prematurely if one of the following occurs during the run:
- The angle of the pole from the vertical position exceeds 15 degrees.
- The cart's...