Layer split
In this section, we will discuss another kind of approach to improve model parallelism training efficiency called intra-layer model parallelism. Generally speaking, the data structure for holding each layer's neurons can be represented as matrices. One common function during NLP model training and serving is matrix multiplication. Therefore, we can split a layer's matrix in some way to enable in-parallel execution.
Let's discuss it with a simple example. Let's just focus on Layer 1 of any model. It takes the training data as input, and after forward propagation, it generates some outputs to the following layers. We can draw this Layer 1 as shown in Figure 6.11:
As shown in Figure 6.11, we illustrate the data structure that represents Layer 1 of an NLP model. Here, each column represents a neuron. Each weight within a column is a neuron weight. Basically, in this...