Preparing image datasets
Input formats are more complex for image datasets than for tabular datasets, and we need to get them exactly right. The CV algorithms in SageMaker support three input formats:
- Image files
- RecordIO files
- Augmented manifests built by SageMaker Ground Truth
In this section, you'll learn how to prepare datasets in these different formats. To the best of my knowledge, this topic has rarely been addressed in such detail. Get ready to learn a lot!
Working with image files
This is the simplest format, and it's supported by all three algorithms. Let's see how to use it with the image classification algorithm.
Converting an image classification dataset to image format
A dataset in image format has to be stored in S3. Images don't need to be sorted in any way, and you simply could store all of them in the same bucket.
Images are described in a list file, a text file containing a line per image. For image classification...