Transforming data using Scala
In this recipe, we'll mount the Azure Data Lake Storage Gen2 filesystem on DBFS. We'll then read the orders
data from Data Lake and the customer
data from an Azure Synapse SQL pool. We'll apply transformation using Scala, analyze data using SQL, and then insert the aggregated data into an Azure Synapse SQL pool.
Getting ready
To get started, follow these steps:
- Log into https://portal.azure.com using your Azure credentials.
- You will need an existing Azure Databricks workspace and at least one Databricks cluster. You can create these by following the Configuring an Azure Databricks environment recipe.
How to do it…
Let's get started with provisioning the source and the destination data sources. We'll begin by creating and uploading files to an Azure Data Lake Storage Gen2 account:
- Execute the following command to create an Azure Data Lake Storage Gen2 account and upload the necessary files...