Asynchronous methods
We have seen a lot of interesting methods in this chapter, but they all suffer from the constraint of being very slow to train. This isn't such a problem when we are running on basic control problems, such as the cart-pole task. But for learning Atari games, or the even more complex human tasks that we might want to learn in the future, the days to weeks of training time are far too long.
A big part of the time constraint, for both policy gradients and actor-critic, is that when learning online, we can only ever evaluate one policy at a time. We can get significant speed improvements by using more powerful GPUs and bigger and bigger processors; the speed of evaluating the policy online will always act as a hard limit on performance.
This is the problem that asynchronous methods aim to solve. The idea is to train multiple copies of the same neural networks across multiple threads. Each neural network trains online against a separate instance of the environment running on...