Wrapping up the whole model parallelism pipeline
In this section, we will discuss the components for implementing model parallelism. We will first discuss how to implement a model parallel training pipeline and then how to implement a model parallel serving pipeline.
A model parallel training overview
Let's look at a simple example of model parallel training, as shown in the following diagram:
As shown in the preceding diagram, we have a three-layer DNN model, and we split each layer onto one GPU. For example, we put Layer 1 on GPU1 and Layer 2 on GPU2.
Forward propagation in model parallel training is shown on the left side of Figure 7.1. It works as follows:
- After GPU1 consumes the input training batch, it will calculate the activation values of Layer 1.
- After GPU2 receives output from GPU1, GPU2 starts its own forward propagation, which...