Summary
In this chapter, you have learned how to build enterprise-grade ETL pipelines and data transformations in Azure Machine Learning, as well as how to manage datasets.
You have learned how to load data into the cloud using blob storage and how to extract data from various other data formats. If you model your data in abstract data stores and datasets, then your users don't have to know where the data is located, how it is encoded, or what is the correct protocol and permission to access it. This is an essential part of an ETL pipeline. Another great way is to see your dataset definitions as contracts about what your users can expect from the data, very similar to an API. Therefore, it should make sense to follow a specific life cycle of creating datasets, updating and versioning them, before deprecating and archiving them if no longer used.
Using the Azure DataPrep SDK, you acquired the skills to write scalable data platforms using dataflows. We looked into how to create...