Introduction to TD Learning
After having studied dynamic programming and Monte Carlo methods in the previous chapters, in this chapter, we will focus on temporal difference learning, one of the main stepping stones of reinforcement learning. We will start with their simplest formulation, that is, the one-step methods, and we will build on them to create their most advanced formulation, which is based on the eligibility traces concept. We will see how this new approach allows us to frame TD and MC methods under the same derivation idea, giving us the ability to compare the two. Throughout this chapter, we will implement many different flavors of TD methods and apply them to the FrozenLake-v0 environment under both the deterministic and the stochastic environment dynamics. Finally, we will solve the stochastic version of FrozenLake-v0 with an off-policy TD method known as Q-learning.
Temporal difference learning, whose name derives from the fact that it uses differences in state...