Distributed Distributional DDPG
The Distributed Distributional Deep Deterministic Policy Gradient (D4PG) algorithm is given as follows:
- Initialize the critic network parameter and the actor network parameter
- Initialize the target critic network parameter and the target actor network parameter by copying from and , respectively
- Initialize the replay buffer
- Launch L number of actors
- For N number of episodes, repeat step 6
- For each step in the episode, that is, for t = 0, . . ., T – 1:
- Randomly sample a minibatch of K transitions from the replay buffer
- Compute the target value distribution of the critic, that is,
- Compute the loss of the critic network and calculate the gradient as
- After computing gradients, update the critic network parameter using gradient descent:
- Compute the gradient of the actor network
- Update the actor network parameter by gradient ascent: ...