Data orchestration
Data orchestration can be defined as the process of combining data from various sources, including the steps to import, transform, and load it to the destination data source, with the fundamental principle being the ability to automate all the steps involved in the data preparation steps in a repeatable and reusable form, which can then be integrated with the overall ML pipelines. While data orchestration can be used in a wider context that can also include resource provisioning, scaling, and monitoring, the core of data orchestration is creating and automation data workflows, and this is where we will focus for the remainder of the book. The other heavy-lifting tasks of provisioning, scaling, and monitoring are taken care of by AWS. SageMaker Data Wrangler uses a data flow to connect the datasets and perform transformation and analysis steps. This data flow can be used to define your data pipeline and consists of all the steps that are involved in data preparation...