We have prepared ourselves with all the background information required to implement the deep n-step advantage actor-critic (A2C) agent. Let's look at an overview of the agent implementation process and then jump right into the hands-on implementation.
The following is the high-level flow of our A2C agent:
- Initialize the actor's and critic's networks.
- Use the current policy of the actor to gather n-step experiences from the environment and calculate the n-step return.
- Calculate the actor's and critic's losses.
- Perform the stochastic gradent descent optimization step to update the actor and critic parameters.
- Repeat from step 2.
We will implement the agent in a Python class named DeepActorCriticAgent. You will find the full implementation in this book's code repository under 8th chapter: ch8...