Simonyan and Zisserman also introduced a data augmentation mechanism that they named scale jittering. At each training iteration, they randomly scale the batched images (from 256 pixels to 512 pixels for their smaller side) before cropping them to the proper input size (224 × 224 for their ILSVRC submission). With this random transformation, the network will be confronted with samples with different scales and will learn to properly classify them despite this scale jittering (refer to Figure 4.2). The network becomes more robust as a result, as it is trained on images covering a larger range of realistic transformations.
Data augmentation is the procedure of synthetically increasing the size of training datasets by applying random transformations to their images in order to create different versions. Details and concrete examples are provided in Chapter 7, Training on Complex and Scarce Datasets.
The authors also suggested applying random...