Managing data with Pig Latin
Pig Latin is one of the programs available in big data clusters. The purpose of this program is to run scripts that can accept any type of data. "Pig can eat everything," as the mantra of the creators states.
This recipe is just meant to show you how to call a simple Pig script. No transformations are done. The purpose of the script is to show you how we can use an Azure Pig task with SSIS.
Getting ready
This recipe assumes that you have created a HDInsight cluster successfully.
How to do it...
- In the
StgAggregatedSales.dtsx
SSIS package, drag and drop anAzure Pig Task
onto the control flow. Rename itapt_AggregateData
.
- Double-click on it to open the
Azure HDInsight Pig Task Editor
and set the properties as shown in the following screenshot:
- In the script property, insert the following code:
SalesExtractsSource = LOAD 'wasbs:///Import/FactOrdersAggregated.txt'; rmf wasbs:///Export/; STORE SalesExtractsSource INTO 'wasbs:///Export/' USING PigStorage('|');
- The first...