Introduction
Data warehouse architects are facing the need to integrate many types of data. Cloud data integration can be a real challenge for on-premises data warehouses for the following reasons:
- The data sources are obviously not stored on-premises and the data stores differ a lot from what ETL tools such as SSIS are usually made for. As we saw earlier, the out-of-the-box SSIS toolbox has sources, destinations, and transformation tools that deal with on-premises data only.
- The data transformation toolset is quite different to the cloud one. In the cloud, we don't necessarily use SSIS to transform data. There are specific data transformation languages such as Hive and Pig that are used by the cloud developers. The reason for this is that the volume of data may be huge and these languages are running on clusters. as opposed to SSIS, which is running on a single machine.
While there are many cloud-based solutions on the market, the recipes in this chapter will talk about the Microsoft Azure...