Implementing the Deep Deterministic Policy Gradient algorithm and DDPG agent
Deterministic Policy Gradient (DPG) is a type of Actor-Critic RL algorithm that uses two neural networks: one for estimating the action value function, and the other for estimating the optimal target policy. The Deep Deterministic Policy Gradient (DDPG) agent builds upon the idea of DPG and is quite efficient compared to vanilla Actor-Critic agents due to the use of deterministic action policies. By completing this recipe, you will have access to a powerful agent that can be trained efficiently in a variety of RL environments.
Getting ready
To complete this recipe, you will first need to activate the tf2rl-cookbook
Conda Python virtual environment and pip install -r requirements.txt
. If the following import statements run without issues, you are ready to get started!
import argparse import os import random from collections import deque from datetime import datetime import gym import numpy as np import...