We will use the keras ImageDataGenerator to generate additional data, using affine transformation on the image pixel coordinates. The transformations that we will primarily use are rotation, translation, and scaling. If the pixel spatial coordinate is defined by x = [x1x2]T ∈ R2, then the new coordinate of the pixel can be given by the following:
Here, M = R2x2 is the affine transformation matrix, and b = [b1 b2]T ∈ R2 is a translation vector.
The term b1 specifies the translation along one of the spatial directions, while b2 provides the translation along the other spatial dimension.
These transformations are required, because neural networks are not, in general, translational invariant, rotational invariant, or scale invariant. Pooling operations do provide some translational invariance, but it is generally...