The agent code now plays or explores the environment and it is helpful if we understand how this code runs. Open up Chapter_3_3.py again and follow the exercise:
- All we need to focus on for this section is how the agent plays the game. Scroll down to the play_game function, as shown in the following:
def play_game(env, policy, display=True):
env.reset()
episode = []
finished = False
while not finished:
s = env.env.s
if display:
clear_output(True)
env.render()
sleep(1)
timestep = []
timestep.append(s)
n = random.uniform(0, sum(policy[s].values()))
top_range = 0
action = 0
for prob in policy[s].items():
top_range += prob[1]
if n < top_range:
action = prob[0]
break
state, reward, finished, info = env.step(action)
timestep.append(action)
timestep.append...