Learning DDPG, TD3, and SAC
In the previous chapter, we learned about interesting actor-critic methods, such as Advantage Actor-Critic (A2C) and Asynchronous Advantage Actor-Critic (A3C). In this chapter, we will learn several state-of-the-art actor-critic methods. We will start off the chapter by understanding one of the popular actor-critic methods called Deep Deterministic Policy Gradient (DDPG). DDPG is used only in continuous environments, that is, environments with a continuous action space. We will understand what DDPG is and how it works in detail. We will also learn the DDPG algorithm step by step.
Going forward, we will learn about the Twin Delayed Deep Deterministic Policy Gradient (TD3). TD3 is an improvement over the DDPG algorithm and includes several interesting features that solve the problems faced in DDPG. We will understand the key features of TD3 in detail and also look into the algorithm of TD3 step by step.
Finally, we will learn...