Rewriting an SSIS package using ADF
From the last recipe, there was one package that did not run – HiveSSIS.dtsx
. This was due to the fact that a component was missing in the basic SSIS integration runtime setup: the Java Runtime Environment (JRE). We could have tried to install it but since the package is quite simple, we will re-write it in the data factory.
We have several options:
- We can still use Hive in HDInsight to transform the data. This would be fast and would be the right choice if the transformation logic was complex, and we had a tight deadline. ADF has a Hive activity as well as an HDInsight cluster compute connector. So, this solution could be a valid choice. But there are cons to it as it requires Hadoop technology that can be much slower than the new kid on the block: Spark. It also makes it harder to debug as HDInsight error messages can sometimes be complex to analyze.
- Since the Hive logic is simple, we can re-write it using an ADF mapping data...