Chapter 3: Data Preparation and Manipulation Techniques
In this chapter, you will learn how to convert the two common data types into structures suitable for ingestion pipelines—structured CSVs or pandas DataFrames into a dataset, and unstructured data such as images into TFRecords.
Along the way, there will be some tips and utility functions that are reusable in many situations. You will also understand the rationale of the conversion process.
As demonstrated in the previous chapter, TensorFlow Enterprise takes advantage of the flexibility offered by the Google Cloud AI platform to access training data. Once access to the training data is resolved, our next task is to develop a workflow to let the model consume the data efficiently. In this chapter, we will learn how to examine and manipulate commonly used data structures.
While TensorFlow can consume Pythonic data structures such as pandas or numpy directly, for resource throughput and ingestion efficiency, TensorFlow...