Neural networks (NNs) are data-hungry models. The larger the datasets they can iterate on during training, the more accurate and robust these neural networks will become. As we have already noticed in our experiments, training a network is thus a heavy task, which can take hours, if not days.
As GPU/TPU hardware architectures are becoming more and more performant, the time needed to feed forward and backpropagate for each training iteration keeps decreasing (for those who can afford these devices). The speed is such nowadays that NNs tend to consume training batches faster than typical input pipelines can produce them. This is especially true in computer vision. Image datasets are commonly too heavy to be entirely preprocessed, and reading/decoding image files on the fly can cause significant delays (especially when repeated millions of times per training).