The random CartPole agent
Although the environment is much more complex than our first example in section 2.1, the code of the agent is much shorter. This is the power of reusability, abstractions, and third-party libraries!
So, here is the code (you can find it in Chapter02/02_cartpole_random.py):
import gymnasium as gym
if __name__ == "__main__":
env = gym.make("CartPole-v1")
total_reward = 0.0
total_steps = 0
obs, _ = env.reset()
Here, we created the environment and initialized the counter of steps and the reward accumulator. On the last line, we reset the environment to obtain the first observation (which we will not use, as our agent is stochastic):
while True:
action = env.action_space.sample()
obs, reward, is_done, is_trunc, _ = env.step(action)
total_reward += reward
total_steps += 1
if is_done:...