Distributed deep Q-learning
Deep learning models are notorious for their hunger for data. When it comes to reinforcement learning, the hunger for data is much greater, which mandates parallelization for data collection while training RL models. The original DQN model is a single-threaded process. Despite its great success, it has limited scalability. In this section, we present methods to parallelize deep Q-learning to many (possibly thousands) of processes.
The key insight behind distributed Q-learning is its off-policy nature, which virtually decouples the training from experience generation. In other words, the specific processes/policies that generate the experience do not matter to the training process (although there are caveats to this statement). Combined with the idea of using a replay buffer, this allows us to parallelize the experience generation and store the data in central or distributed replay buffers. In addition, we can parallelize how the data is sampled from these...