TDL for first step or TD(0) then essentially simplifies to Q-learning. To do a full comparison of this method against DP and MC, we will first revisit the FrozenLake environment from Gym. Open up example code Chapter_4_4.py and follow the exercise:
- The full listing of code is too large to show. Instead, we will review the code in sections starting with the imports:
from os import system, name
from time import sleep
import numpy as np
import gym
import random
from tqdm import tqdm
- We have seen all of these imports before, so there is nothing new here. Next, we cover the initialization of the environment and outputting some initial environment variables:
env = gym.make("FrozenLake-v0")
env.render()
action_size = env.action_space.n
print("Action size ", action_size)
state_size = env.observation_space.n
print("State size ", state_size...