Implementing the Asynchronous Advantage Actor-Critic algorithm and A3C agent
The A3C algorithm builds upon the Actor-Critic class of algorithms by using a neural network to approximate the actor (and critic). The actor learns the policy function using a deep neural network, while the critic estimates the value function. The asynchronous nature of the algorithm allows the agent to learn from different parts of the state space, allowing parallel learning and faster convergence. Unlike DQN agents, which use an experience replay memory, the A3C agent uses multiple workers to gather more samples for learning. By the end of this recipe, you will have a complete script to train an A3C agent for any continuous action valued environment of your choice!
Getting ready
To complete this recipe, you will first need to activate the tf2rl-cookbook
Conda Python virtual environment and pip install -r requirements.txt
. If the following import statements run without issues, you are ready to get...