Now, let's explore the concept of distributed deep learning.
Distributed deep learning
Model parallelization versus data parallelization
When we train large amounts of data, or when the network structure is huge, we usually need to distribute the training across different machines/threads so that learning can be performed in parallel. This parallelization may happen within a single machine with several GPUs, or across several machines that are synchronized through a network. The two main strategies for distributing deep learning workloads are data parallelization and model parallelization.
In data parallelization, we run a number of mini-batches in parallel using the same weights (that is, the same model). This implies...