Summary
In this chapter, we checked three different methods with the aim of improving the stability of the stochastic policy gradient and compared them to the A2C implementation on two continuous control problems. With methods from the previous chapter (DDPG and D4PG), they create the basic tools to work with a continuous control domain. Finally, we checked a relatively new off-policy method that is an extension of DDPG: SAC
In the next chapter, we will switch to a different set of RL methods that have been becoming popular recently: black-box or gradient-free methods.