Building a Q-learning agent
This recipe will show you how to build a Q-learning agent. Q-learning can be applied to model-free RL problems. It supports off-policy learning and therefore provides a practical solution to problems where available experiences were/are collected using some other policy or by some other agent (even humans).
Upon completing this recipe, you will have a working RL agent that, when acting in the GridworldV2 environment, will generate the following state-action value function using the SARSA algorithm:
Getting ready
To complete this recipe, you will need to activate the tf2rl-cookbook
Python/conda virtual environment and run pip install -r requirements.txt
. If the following import statements run without issues, you are ready to get started:
import numpy as np import random
Now, let's begin.
How to do it…
Let's implement...