Let's recreate the Taxi-v2 environment. We'll need to import numpy this time. We'll be using the term state instead of observation in this chapter for consistency with the terminology we used in Chapter 1, Brushing Up on Reinforcement Learning Concepts:
import gym
import numpy as np
env = gym.make('Taxi-v2')
state = env.reset()
Create the Q-table as follows:
Q = np.zeros([env.observation_space.n, env.action_space.n])
The Q-table is initialized as a two-dimensional numpy array of zeroes. The first three rows of the Q-table currently look like this:
State | South(0) | North(1) | East(2) | West(3) | Pickup(4) | Dropoff(5) |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 0 | 0 |
The first column represents the state, and the other column names represent the six possible actions. The Q-values for of all the state-action pairs are currently at zero....