The random CartPole agent
Although the environment is much more complex than our first example in The anatomy of the agent section, the code of the agent is much shorter. This is the power of reusability, abstractions, and third-party libraries!
So, here is the code (you can find it in Chapter02/02_cartpole_random.py
):
import gym if __name__ == "__main__": env = gym.make("CartPole-v0") total_reward = 0.0 total_steps = 0 obs = env.reset()
Here, we create the environment and initialize the counter of steps and the reward accumulator. On the last line, we reset the environment to obtain the first observation (which we'll not use, as our agent is stochastic):
while True: action = env.action_space.sample() obs, reward, done, _ = env.step(action) total_reward += reward total_steps += 1 if done: break print("Episode done in %d steps, total reward %.2f" % (total_steps, total_reward...