Chapter 10: Orchestrating the Data Pipeline
Throughout this book, we have been discussing various services that can be used by data engineers to ingest and transform data, as well as make it available for consumers. We looked at how we could ingest data via Amazon Kinesis Data Firehose and Amazon Database Migration Service, and how we could run AWS Lambda and AWS Glue functions to transform our data. We also discussed the importance of updating a data catalog as new datasets are added to a data lake, and how we can load subsets of data into a data mart for specific use cases.
For the hands-on exercises, we made use of various services, but for the most part, we triggered these services manually. However, in a real production environment, it would not be acceptable to have to manually trigger these tasks, so we need a way to automate various data engineering tasks. This is where data pipeline orchestration tools come in.
Modern-day ETL applications are designed with a modular...