Parallelizing TensorFlow
Training a model can be very time-consuming. Fortunately, TensorFlow offers several distributed strategies to speed up the training, whether for a very large model or a very large dataset. This recipe will show us how to use the TensorFlow distributed API.
Getting ready
The TensorFlow distributed API allows us to distribute the training by replicating the model into different nodes and training on different subsets of data. Each strategy supports a hardware platform (multiple GPUs, multiple machines, or TPUs) and uses either a synchronous or asynchronous training strategy. In synchronous training, each worker trains over different batches of data and aggregates their gradients at each step. While in the asynchronous mode, each worker is independently training over the data and the variables are updated asynchronously. Note that for the moment, TensorFlow only supports data parallelism described above and according to the roadmap, it will soon...