Policy and value iteration methods are quite similar and looked at as companion methods. As such, to evaluate which method to use, we often need to apply both methods to the problem in question. In the next exercise, we will evaluate both policy and value iteration methods side by side in the FrozenLake environment:
- Open the Chapter_2_8.py example. This example builds on the previous code examples, so we will only show the new additional code:
def play(env, episodes, policy):
wins = 0
total_reward = 0
for episode in range(episodes):
term = False
state = env.reset()
while not term:
action = np.argmax(policy[state])
next_state, reward, term, info = env.step(action)
total_reward += reward
state = next_state
if term and reward == 1.0:
wins...