Understanding the basics of orchestration
Once you have written and tested your transformations, you need a way to define dependencies among the various steps of your data engineering pipelines, define a strategy to deal with failures, and so on. This is where orchestration comes in. It allows you to define the strategy of data pipeline execution. For example, which conditions must be met before the job starts, which transformations are going to run in parallel, what happens when a job fails (do you want to try after a certain interval or ignore it? Should the pipeline be aborted?), and so on. It is important to get it right to ensure optimal performance and cost savings.
In the following sections, we are going to look at some of the popular orchestration tools in the industry.