Working with big image datasets
Images tend to be big files. In fact, it's likely that you will not be able to fit your entire image dataset into your machine's RAM.
Therefore, we need to load the images from disk "just in time" rather than loading them all in advance. In this section, we will be setting up an image data generator that loads images on the fly.
We'll be using a dataset of plant seedlings in this case. This was provided by Thomas Giselsson and others, 2017, via their publication, A Public Image Database for Benchmark of Plant Seedling Classification Algorithms.
This dataset is available from the following link: https://arxiv.org/abs/1711.05458.
You may be wondering why we're looking at plants; after all, plant classifications are not a common problem that is faced in the finance sector. The simple answer is that this dataset lends itself to demonstrating many common computer vision techniques and is available under an open domain license; it's therefore a great training dataset...