The data parallel training pipeline in a nutshell
In this section, we will mainly focus on using the All-Reduce-based data parallel architecture. Here, we will wrap up the whole data parallel training pipeline. The whole training workflow is shown in the following diagram:
As we can see, the training pipeline of each worker consists of six steps:
- Input Pre-Processing: Given the raw training input data, we need to pre-process it. Common input pre-processing techniques include image crop, image flip, input data normalization, and many more.
- Input Data Partition: Split the whole input dataset into multiple chunks and assign each chunk to one accelerator for the model training process.
- Data Loading: Load the data partition into the accelerators we use to train the model.
- Training: Train the model locally with its training input data.
- Model Synchronization: After...