Handling image data at scale
Handling data and their respective labels is simple if the everything can be loaded into Python engine's runtime memory. However, in the case of constructing a data pipeline for ingestion into a model training workflow, we want to ingest or stream data in batches so that we don't rely on the runtime memory to hold all the training data. In this case, maintaining the one-to-one relationship between the data (image) and label has to be preserved. We are going to see how to do this with TFRecord
. We have already seen how to convert one image to a TFRecord
. With multiple images, the conversion process is exactly the same for each image.
Let's take a look at how we can reuse and refactor the code from the previous section to apply to a batch of images. Since you have seen how it was done for a single image, you will have little to no problem understanding the code and rationale here.
Typically, when working with images for classification...