Handling image data for input pipelines
While there are many types of unstructured data, images are probably the most frequently encountered type. TensorFlow provided TFRecord
as a type of dataset for image data. In this section, we are going to learn how to convert image data in Cloud Storage into a TFRecord
object for input pipelines.
When working with image data in a TensorFlow pipeline, the raw image is typically converted to a TFRecord
object for the same reason as for CSV or DataFrames. Compared to a raw numpy array, a TFRecord
object is a more efficient and scalable representation of the image collections. Converting raw images to a TFRecord
object is not a straightforward process. In TFRecord
, the data is stored as a binary string. In this section, we are going to show how to do this step by step.
Let's start with the conversion process of converting a raw image to a TFRecord
object. Feel free to upload your own images to the JupyterLab instance:
- Upload...