Transforming data using Python
Data transformation at scale is one of the most important uses of Azure Databricks. In this recipe, we'll read product orders from an Azure storage account, read customer information from an Azure SQL Database, join the orders and customer information, apply transformations to filter and aggregate the total order by country and customers, and then insert the output into an Azure SQL Database.
Getting ready
To get started, follow these steps:
- Log into https://portal.azure.com using your Azure credentials.
- You will need an existing Azure Databricks workspace and at least one Databricks cluster. You can create these by following the Configuring an Azure Databricks environment recipe.
How to do it…
Let's get started by creating an Azure storage account and an Azure SQL database:
- Execute the following command to create an Azure Storage account and upload the
orders
files to theorders/datain
container:....