Summary
In this chapter, we discussed how we can use TFRecords and Petastorm as libraries to make the process of loading a large amount of data easier to train our distributed deep learning models. This led to us learning how these records are structured, how we can handle expensive operations such as automated schema inference, how we can prepare records to be consumed, and how we can use them not only in the context of deep learning frameworks but also for pure Python applications.
We finished the chapter with an example of how we can leverage having pre-trained models to extract new features based on domain knowledge that later can be applied to extract features to train a new model.
In the next chapter, we will learn how we can fine-tune the parameters of our deep learning models to improve their performance in Azure Databricks.