If you implemented DPG with the deep neural networks that were presented in the previous section, the algorithm would be very unstable and it wouldn't be capable of learning anything. We encountered a similar problem when we extended Q-learning with deep neural networks. Indeed, to combine DNN and Q-learning in the DQN algorithm, we had to employ some other tricks to stabilize learning. The same holds true for DPG algorithms. These methods are off-policy, just like Q-learning, and as we'll soon see, some ingredients that make deterministic policies work with DNN are similar to the ones used in DQN.
DDPG (Continuous Control with Deep Reinforcement Learning by Lillicrap, and others: https://arxiv.org/pdf/1509.02971.pdf) is the first deterministic actor-critic that employs deep neural networks, for learning both the actor and the critic...