Now that you have learned how to process the data to have specific distributions, it is important for you to know about data augmentation, which is usually associated with missing data or high-dimensional data. Traditional machine learning algorithms may have problems dealing with data where the number of dimensions surpasses the number of samples available. The problem is not particular to all deep learning algorithms, but some algorithms have a much more difficult time learning to model a problem that has more variables to figure out than samples to work on. We have a few options to correct that: either we reduce the dimensions or variables (see the following section) or we increase the samples in our dataset (this section).
One of the tools for adding more data is known as data augmentation (Van Dyk, D. A., and Meng, X. L. (2001)). In this section, we will use the MNIST dataset to exemplify a few techniques for data augmentation that are particular to images but...