Summary
This chapter provided explanations and examples for dealing with commonly seen structured and unstructured data. We first looked at how to read and format a pandas DataFrame or CSV type of data structure and converted it to a dataset for efficient data ingestion pipelines. Then, as regards unstructured data, we used image files as examples. While dealing with image data, we have to organize these image files in a hierarchical pattern, such that labels can be easily mapped to each image file. TFRecord
is the preferred format for handling image data, as it wraps the image dimension, label, and image raw bytes together in a format known as tf.Example
.
In the next chapter, we are going to take a look at reusable models and patterns that can consume these data structures we have learned here.