Twin Delayed DDPG
The algorithm for Twin Delayed DDPG (TD3) is given as follows:
- Initialize two main critic networks parameters, and , and the main actor network parameter
- Initialize two target critic networks parameters, and , by copying the main critic network parameters and , respectively
- Initialize the target actor network parameter by copying the main actor network parameter
- Initialize the replay buffer
- For N number of episodes, repeat step 6
- For each step in the episode, that is, for t = 0, . . ., T – 1:
- Select action a based on the policy and with exploration noise , that is, where,
- Perform the selected action a, move to the next state , get the reward r, and store the transition information in the replay buffer
- Randomly sample a minibatch of K transitions from the replay buffer
- Select the action for computing the target value where
- Compute the target value of the...