Running a data pipeline
Once the development and deployment have succeeded, it is time to orchestrate the data pipeline. Data pipeline runs are typically instantiated using the following three methods:
- Manually—The simplest way to invoke a data pipeline is by doing this manually. This means that action needs to be taken by either using the control panel, command-line tools, or REpresentational State Transfer (REST) APIs. This method is suitable for development/testing or one-off executions but is unsuitable for production. As an example, data engineers may choose to run a pipeline manually while performing unit testing or may need to perform a one-off execution of the pipeline because the scheduled run failed.
- Scheduled—In this method, the data pipeline is invoked using a scheduler. The scheduler can either be operating system-based—using schedulers in orchestration tools—or built into the ETL tool itself. This is the most common method of invoking...