Chapter 3: Creating ETL Operations with Azure Databricks
In this chapter, we will learn how to set up different connections to use external sources of data such as Simple Storage Service (S3), set up our Azure Storage account, and use Azure Databricks notebooks to create extract, transform, and load (ETL) operations that clean and transform data. We will leverage Azure Data Factory (ADF), and finally, we will look at an example of designing an ETL operation that is event-driven. By exploring the sections in this chapter, you will be able to have a high-level understanding of how data can be loaded from external sources and then transformed into data pipelines, constructed and orchestrated using Azure Databricks. Let's start with a brief overview of Azure Data Lake Storage Gen2 (ADLS Gen2) and how to use it in Azure Databricks.
In this chapter, we will look into the following topics:
- Using ADLS Gen2
- Using S3 with Azure Databricks
- Using Azure Blob storage with...