Being able to efficiently extract and transform data for the training of complex applications is primordial, but this is assuming that enough data is available for such tasks in the first place. After all, NNs are data-hungry methods and even though we are in the big data era, large enough datasets are still tenuous to gather and even more difficult to annotate. It can take several minutes to annotate a single image (for instance, to create the ground truth label map for semantic segmentation models), and some annotations may have to be validated/corrected by experts (for instance, when labeling medical pictures). In some cases, images themselves may not be easily available. For instance, it would be too time- and money-consuming to take pictures of every manufactured object and their components when building automation models for industrial plants.
Data scarcity is, therefore, a common problem in computer vision, and much effort has been expended trying...