moData is the lifeblood of deep learning applications. As such, training data should be able to flow unobstructed into networks, and it should contain all the meaningful information that is essential to prepare the methods for their tasks. Oftentimes, however, datasets can have complex structures or be stored on heterogeneous devices, complicating the process of efficiently feeding their content to the models. In other cases, relevant training images or annotations can be unavailable, depriving models of the information they need to learn.
Thankfully, for the former cases, TensorFlow provides a rich framework to set up optimized data pipelines—tf.data. For the latter cases, researchers have been proposing multiple alternatives when relevant training data is scarce—data augmentation, generation of synthetic datasets, domain...