This chapter will give an overview of how to build complex input data pipelines for ingesting large training/inference datasets in the most common formats, such as CSV files, images, text, and so on using tf.data APIs consisting of the TFRecords and tf.data.Dataset methods. You will also get a general idea about protocol buffers, protocol messages, and how they are implemented using the TFRecords and tf.Example methods in TensorFlow 2.0 (TF 2.0). This chapter also explains the best practices for using the tf.data.Dataset method with respect to the shuffling, batching, and prefetching of data, and provides recommendations in terms of TF 2.0. Finally, we will talk about the built-in TensorFlow datasets, which have been newly added and are extremely useful for building a prototype model training pipeline.
The following topics will be...