Implementing temporal difference learning
This recipe will walk you through how to implement the temporal difference (TD) learning algorithm. TD algorithms allow us to incrementally learn from incomplete episodes of agent experiences, which means they can be used for problems that require online learning capabilities. TD algorithms are useful in model-free RL settings as they do not depend on a model of the MDP transitions or rewards. To visually understand the learning progression of the TD algorithm, this recipe will also show you how to implement the GridworldV2 learning environment, which looks as follows when rendered:
Getting ready
To complete this recipe, you will need to activate the tf2rl-cookbook
Python/conda virtual environment and run pip install numpy gym
. If the following import statements run without issues, you are ready to get...