Implementing incremental data loading with a mapping data flow
A mapping data flow provides a code-free data flow transformation environment. We use the UI to implement the ETL and process the pipeline. Spark clusters are then provisioned, and then the data flow is transformed to Spark code and executed.
In this recipe, we'll look at one of the approaches to implement incremental data loading using a mapping data flow.
Getting ready
To get started, do the following:
- Log in to https://portal.azure.com using your Azure credentials.
- Open a new PowerShell prompt. Execute the following command to log in to your Azure account from PowerShell:
Connect-AzAccount
- You will need an existing Data Factory account. If you don't have one, create one by executing the following PowerShell script:
~/azure-data-engineering-cookbook\Chapter04\3_CreatingAzureDataFactory.ps1
. - Create an Azure storage account and upload files to the
~/Chapter06/Data folder in orders/datain...