Using ADLS Gen2
To persist data in Azure Databricks, we need a data lake. We will use ADLS Gen2, so our first step is to set up an account. This will allow us to store permanent data and use it to run ETL pipelines, get analytics, or use it to build machine learning (ML) models.
Setting up a basic ADLS Gen2 data lake
To set up an ADLS Gen2 subscription, we first need to create a new resource in our Azure portal. To do this, follow these next steps:
- Search for Storage accounts and select Create a new Storage account.
- Attach it to a resource group, set up a name, set the Account kind field to StorageV2 (general-purpose v2) and, finally, set Replication to Locally-redundant storage (LRS), as illustrated in the following screenshot:
- Before finalizing, in the Advanced tab, set the Hierarchical namespace option to Enabled so that we can use ADLS Gen2 from our notebooks. The following screenshot illustrates this...