Deep Deterministic Policy Gradient (DDPG) is an off-policy, model-free, actor-critic algorithm and is based on the Deterministic Policy Gradient (DPG) theorem (proceedings.mlr.press/v32/silver14.pdf). Unlike the deep Q-learning-based methods, actor-critic policy gradient-based methods are easily applicable to continuous action spaces, in addition to problems/tasks with discrete action spaces.
Deep Deterministic Policy Gradients
Core concepts
In Chapter 8, Implementing an Intelligent Autonomous Car Driving Agent Using the Deep Actor-Critic algorithm, we walked you through the derivation of the policy gradient theorem and reproduced the following for bringing in context:
You may recall that the policy we considered was a stochastic...