Summary
In this chapter, you learned how to load different forms of data and perform some preprocessing steps for a variety of data types. You began with tabular data in the form of a CSV file. Since the dataset consisted of a single CSV file, you utilized the pandas library to load the file into memory.
You then proceeded to preprocess the data by scaling the fields and converting all the fields into numerical data types. This is important since TensorFlow models can only be trained on numerical data, and the training process is improved in terms of speed and accuracy if all the fields are of the same scale.
Next, you explored how to load the image data. You batched the data so that you did not have to load in the entire dataset at once, which allowed you to augment the images. Image augmentation is useful as it increases the effective number of training examples and can help make a model more robust.
You then learned how to load in text data and took advantage of pretrained models. This helped you embed text into vectors that retain contextual information about the text. This allowed text data to be input into TensorFlow models since they require numerical tensors as inputs.
Finally, the final section covered how to load and process audio data and demonstrated some advanced signal processing techniques, including generating MFCCs, which can be used to generate informationally dense numerical tensors that can be input into TensorFlow models.
Loading and preprocessing data so that it can be input into machine learning models is an important and necessary first step to training any machine learning model. In the next chapter, you will explore many resources that TensorFlow provides to aid in the development of model building.