Once we let the agent train at the Gym, we want to be able to measure how well it has learned. To do that, we let the agent go through a test. Just like in school! test(agent, env, policy) takes the agent object, the environment instance, and the agent's policy to test the performance of the agent in the environment, and returns the total reward for one full episode. It is similar to the train(agent, env) function we saw earlier, but it does not let the agent learn or update its Q-value estimates:
def test(agent, env, policy):
done = False
obs = env.reset()
total_reward = 0.0
while not done:
action = policy[agent.discretize(obs)]
next_obs, reward, done, info = env.step(action)
obs = next_obs
total_reward += reward
return total_reward
Note that the test(agent, env, policy) function...