Why do we need an efficient data pipeline?
We’ll start this chapter by making you aware of the relevance of having an efficient data pipeline. In the next few subsections, you will understand what a data pipeline is and how it can impact the performance of the training process.
What is a data pipeline?
As you learned in Chapter 1, Deconstructing the Training Process, the training process is composed of four phases: forward, loss calculation, optimization, and backward. The training algorithm iterates on dataset samples until there’s a complete epoch. Nevertheless, there is an additional phase we excluded from that explanation: data loading.
The forward phase invokes data loading to get dataset samples to execute the training process. More specifically, the forward phase calls the data loading process on each iteration to get the data required to execute the current training step, as shown in Figure 5.1:
Figure 5.1 – Data loading...