Transforming data with Pig
This section is almost identical to the previous one. The only difference is that we're going to use Pig to do the transformations and use different containers in our storage.
Pig is a scripting language that can eat anything. This means that we can consume almost any type of file easily with Pig. However, there is a restriction on the file types we can use with the HDInsight version of Pig: The Parquet file cannot be used. This is because a library is missing in the out-of-the-box HDInsight cluster. For this reason, we will use a regular text file format in this recipe.
A lot of the steps in this section are similar to the steps shown in the previous recipe. If you completed all the recipes provided in the previous sections, some of them can be skipped. We'll indicate when this is the case.
Getting ready
This recipe requires that you have the following:
- Visual Studio 2019 with the Integration Services extension installed ...