Chapter 8: Distributed Training for Accelerated Development of Deep RL Agents
Training Deep RL agents to solve a task takes enormous wall-clock time due to the high sample complexity. For real-world applications, iterating over agent training and testing cycles at a faster pace plays a crucial role in the market readiness of a Deep RL application. The recipes in this chapter provide instructions on how to speed up Deep RL agent development using the distributed training of deep neural network models by leveraging TensorFlow 2.x’s capabilities. Strategies for utilizing multiple CPUs and GPUs both on a single machine and across a cluster of machines are discussed. Multiple recipes for training distributed Deep Reinforcement Learning (Deep RL) agents using the Ray, Tune, and RLLib frameworks are also provided.
Specifically, the following recipes are a part of this chapter:
- Building distributed deep learning models using TensorFlow 2.x – Multi-GPU training ...