Distributed Distributional DDPG
D4PG, which stands for Distributed Distributional Deep Deterministic Policy Gradient, is one of the most interesting policy gradient algorithms. We can make a guess about how D4PG works just by its name. As the name suggests, D4PG is basically a combination of deep deterministic policy gradient (DDPG) and distributional reinforcement learning, and it works in a distributed fashion. Confused? Let's go deeper and understand how D4PG works in detail.
To understand how D4PG works, it is highly recommended to revise the DDPG algorithm we covered in Chapter 12, Learning DDPG, TD3, and SAC. We learned that DDPG is an actor critic method where the actor tries to learn the policy while the critic tries to evaluate the policy produced by the actor using the Q function. The critic uses the deep Q network for estimating the Q function and the actor uses the policy network for computing the policy. Thus, the actor performs an action while the critic gives...