Summary
By this point in the chapter, you should be familiar with some of the most common data type sources used within the data extraction phase of an ETL pipeline. The associated GitHub scripts related to this chapter provided you with examples of data importation functions for each of the most common data sources, as well as what it looks like to construct a basic data extraction function for all data types within a command-line runnable Python script. This chapter closed off by walking you through logging tags for each activity, and the importance of logging both successful importations as well as accepting any error exception that might have occurred given a failure to import a data type.
In the next chapter, we begin to dive into the true beauty of the ETL process: data transformation—the step where data, from one or many sources, is molded to clean, statistically manipulated output data, which is the entire purpose of a data pipeline.