Data augmentation is probably the most common and simple method to deal with overly small training sets. It can virtually multiply their number of images by providing different looking versions of each. These various versions are obtained by applying a combination of random transformations, such as scale jittering, random flipping, rotation, and color shift. Data augmentation can incidentally help prevent overfitting, which would usually happen when training a large model on a small set of images.
But even when enough training images are available, this procedure should still be considered. Indeed, data augmentation has other benefits. Even large datasets can suffer from biases, and data augmentation can compensate for some of them. We will illustrate this concept with an example. Let's imagine we want to build a classifier for brush versus pen pictures. However, the pictures for each class were gathered by two different teams that did not agree on a precise...