Solving the Taxi problem with the Q-learning algorithm
Q-learning is also a model-free learning algorithm. It updates the Q-function for every step in an episode. We will demonstrate how Q-learning is used to solve the Taxi environment. It is a typical environment with relatively long episodes. So let's first simulate the Taxi environment.
Simulating the Taxi environment
In the Taxi environment (https://gym.openai.com/envs/Taxi-v3/) the agent acts as a taxi driver to pick up the passenger from one location and drop off the passenger at the destination.
All subjects are on a 5 * 5 grid. Take a look at the following example:
Figure 14.6: Example of the Taxi environment
Tiles in certain colors have the following meanings:
- Yellow: The location of the empty taxi (without the passenger)
- Blue: The passenger's location
- Purple: The passenger's destination
- Green: The location of the taxi with the passenger
The starting...