Deep Deterministic Policy Gradient
The algorithm for Deep Deterministic Policy Gradient (DDPG) is given as follows:
- Initialize the main critic network parameter and main actor network parameter
- Initialize the target critic network parameter by just copying the main critic network parameter
- Initialize the target actor network parameter by just copying the main actor network parameter .
- Initialize the replay buffer
- For N number of episodes, repeat steps 6 to 7
- Initialize an Ornstein-Uhlenbeck random process for action space exploration
- For each step in the episode, that is, for t = 0, . . ., T – 1:
- Select action a based on the policy and exploration noise, that is,
- Perform the selected action a, move to the next state and get the reward r, and store this transition information in the replay buffer
- Randomly sample a minibatch of K transitions from the replay buffer
- Compute the target...