Transferring data between Hadoop and Azure
Now that we have some data created by Hadoop Hive on-premises, we're going to transfer this data to a cloud storage on Azure. Then, we'll do several transformations to it using Hadoop Pig Latin. Once done, we'll transfer the data to an on-premises table in the staging schema of our AdventureWorksLTDW2016
database.
In this recipe, we're going to copy the data processed by the local Hortonworks cluster to an Azure Blob storage. Once the data is copied over, we can transform it using Azure compute resources, as we'll see in the following recipes.
Getting ready
This recipe assumes that you have created a storage space in Azure as described in the previous recipe.
How to do it...
- Open the
ETL.Staging SSIS
project and add a new package to it. Rename itStgAggregateSalesFromCloud.dtsx
. - Add a Hadoop connection manager called
cmgr_Hadoop_Sandbox
like we did in the previous recipe. - Add another connection manager, which will connect to the Azure storage like the...