One of the problems we face with PG methods is that of variability or too much randomness. Of course, we might expect that from sampling from a stochastic or random policy. The Deep Deterministic Policy Gradient (DDPG) method was introduced in a paper titled Continuous control with deep reinforcement learning, in 2015 by Tim Lillicrap. It was meant to address the problem of controlling actions through continuous action spaces, something we have avoided until now. Remember that a continuous action space differs from a discrete space in that the actions may indicate a direction but also an amount or value that expresses the effort in that direction whereas, with discrete actions, any action choice is assumed to always be at 100% effort.
So, why does this matter? Well, in our previous chapter exercises, we explored PG methods over discrete...