Training Deep RL agents at scale – Multi-GPU PPO agent
RL agents in general require a large number of samples and gradient steps to be trained depending on the complexity of the state, action, and the problem space. With Deep RL, the computational complexity also increases drastically as the deep neural network used by the agent (for Q/value-function representation, for policy representation, or for both) has a lot more operations and parameters that need to be executed and updated, respectively. To speed up the training process, we need the capability to scale our Deep RL agent training to leverage the available compute resources, such as GPUs. This recipe will help you leverage multiple GPUs to train a PPO agent with a deep convolutional neural network policy in a distributed fashion in one of the procedurally generated RL environments using OpenAI’s procgen library.
Let’s get started!
Getting ready
To complete this recipe, you will first need to activate...