In this section, we will start implementing our intelligent agent step-by-step. We will be implementing the famous Q-learning algorithm using the NumPy library and the MountainCar-V0 environment from the OpenAI Gym library.
Let's revisit the reinforcement learning Gym boiler plate code we used in Chapter 4, Exploring the Gym and its Features, as follows:
#!/usr/bin/env python
import gym
env = gym.make("Qbert-v0")
MAX_NUM_EPISODES = 10
MAX_STEPS_PER_EPISODE = 500
for episode in range(MAX_NUM_EPISODES):
obs = env.reset()
for step in range(MAX_STEPS_PER_EPISODE):
env.render()
action = env.action_space.sample()# Sample random action. This will be replaced by our agent's action when we start developing the agent algorithms
next_state, reward, done, info = env.step(action) # Send the action to the...