Transforming data with Hive
Now that we have some data copied to Azure, we'll transform it using a big data language called Hive. Hive is known by the big data community as being the data warehouse language in the cloud. It's essentially SQL, except we use it to transform data.
Getting ready
This recipe requires that you have the following:
- Visual Studio 2019 with the Integration Services extension installed.
- Azure Feature Pack installed.
- Java Runtime Engine installed.
- Access to an Azure subscription.
- An on-demand HDInsight cluster task in your package. Make sure you've completed the Creating an on-demand Azure HDInsight cluster recipe in this chapter.
How to do it…
Now, it's time to copy some data into a container in our storage account. This data will be used to make some transformations using Apache Hive. Let's get started:
- From the SSIS toolbox, drag and drop a dataflow task. Rename it DFT_Sales. ...