We've looked at NumPy and Pandas as ways to feed in-memory dataset to CNTK for training. But not every dataset is small enough to fit into memory. This is especially true for datasets that contain images, video samples, or sound samples. When you work with larger datasets, you only want to load small portions of the dataset at a time into memory. Usually, you will only load enough samples into memory to run a single minibatch of training.
CNTK supports working with larger datasets through the use of MinibatchSource. Now, MinibatchSource is a component that can load data from disk in chunks. It can automatically randomize samples read from the data source. This is useful for preventing your neural network from overfitting due to a fixed order in the training dataset.
MinibatchSource has a built-in transformation pipeline. You can use this pipeline...