Distributional policy gradients
As the last method of this chapter, we will take a look at the paper by Barth-Maron et al., called Distributed distributional deterministic policy gradients [Bar+18], published in 2018.
The full name of the method is Distributed Distributional Deep Deterministic Policy Gradients or D4PG for short. The authors proposed several improvements to the DDPG method to improve stability, convergence, and sample efficiency.
First, they adapted the distributional representation of the Q-value proposed in the paper by Bellemare et al. called A distributional perspective on reinforcement learning, published in 2017 [BDM17]. We discussed this approach in Chapter 8, when we talked about DQN improvements, so refer to it or to the original Bellemare paper for details. The core idea is to replace a single Q-value from the critic with a probability distribution. The Bellman equation is replaced with the Bellman operator, which transforms this distributional...