Summary
In this chapter, we saw one of the most widely used methods in deep RL: A2C, which wisely combines the PG update with value of the state approximation. We introduced the idea behind A2C by analyzing the effect of the baseline on the statistics and convergence of gradients. Then we checked the extension of the baseline idea: A2C, where a separate network head provides us with the baseline for the current state.
In the next chapter, we will look at ways to perform the same algorithm in a distributed way.