It is difficult to classify images accurately if they are translated from their original location. However, given an image, the label of the image will remain the same, even if we translate, rotate, or scale the image. Data augmentation is a way to create more images from the given set of images, that is, by rotating, translating, or scaling them and mapping them to the label of the original image.
An intuition for this is as follows: an image of a person will still be corresponding to the person, even if the image is rotated slightly or the person in the image is moved from the middle of the image to far right of the image.
Hence, we should be in a position to create more training data by rotating and translating the original images, where we already know the labels that correspond to each image.