Summary
In this chapter, we've checked three different methods aiming to improve the stability of the stochastic policy gradient and compared them to A2C implementation on two continuous control problems. With methods from the previous chapter (DDPG and D4PG), they create basic tools to work with a continuous control domain.
In the next chapter, we'll switch to a different set of RL methods that have been becoming popular recently: black-box or gradient-free methods.