Understanding Temporal Difference Learning
Temporal difference (TD) learning is one of the most popular and widely used model-free methods. The reason for this is that TD learning combines the advantages of both the dynamic programming (DP) method and the Monte Carlo (MC) method we covered in the previous chapters.
We will begin the chapter by understanding how exactly TD learning is beneficial compared to DP and MC methods. Later, we will learn how to perform the prediction task using TD learning. Going forward, we will learn how to perform TD control tasks with an on-policy TD control method called SARSA and an off-policy TD control method called Q learning.
We will also learn how to find the optimal policy in the Frozen Lake environment using SARSA and the Q learning method. At the end of the chapter, we will compare the DP, MC, and TD methods.
Thus, in this chapter, we will learn about the following topics:
- TD learning
- TD prediction method
- TD...