Distributional policy gradients
As the last method of this chapter, we will take a look at the very recent paper by Gabriel Barth-Maron, Matthew W. Hoffman, and others, called Distributed Distributional Deterministic Policy Gradients, published in 2018 (https://arxiv.org/abs/1804.08617).
The full name of the method is distributed distributional deep deterministic policy gradients or D4PG for short. The authors proposed several improvements to the DDPG method to improve stability, convergence, and sample efficiency.
First of all, they adapted the distributional representation of the Q-value proposed in the paper by Marc G. Bellemare and others called A Distributional Perspective on Reinforcement Learning, published in 2017 (https://arxiv.org/abs/1707.06887). We discussed this approach in Chapter 8, DQN Extensions, when we talked about DQN improvements, so refer to it or to the original Bellemare paper for details. The core idea is to replace a single Q-value from the critic with...