Summary
In this chapter, we checked three different methods with the aim of improving the stability of the stochastic policy gradient and compared them to the A2C implementation on two continuous control problems. Along with the methods covered in the previous chapter (DDPG and D4PG), these methods are basic tools to work with a continuous control domain. Finally, we checked a relatively new off-policy method that is an extension of DDPG: SAC. Here, we have just scratched the surface of this topic, but it could be a good starting point to dive into it in more depth. These methods are widely used in robotics and related areas.
In the next chapter, we will switch to a different set of RL methods that have been becoming popular recently: black-box or gradient-free methods.
Join our community on Discord
Read this book alongside other users, Deep Learning experts, and the author himself. Ask questions, provide solutions to other readers, chat with the author...