Part 2 – pipeline workflow management platforms in Python
All the Python modules we’ve introduced up to this point in the chapter are valuable tools to improve the efficacy and speed of Python data pipelines, but these modules won’t solve everything. They do not provide a one-size-fits-all solution. As your data requirements expand, you will inevitably encounter the challenge of accommodating increasing capacity.
Pipeline workflow management platforms streamline and automate data pipeline deployments, and are particularly useful in scenarios where multiple tasks need to be executed in a specific order or in parallel, and where data needs to be transformed and passed between asynchronous stages of a given pipeline. There are a number of pipeline workflow management platforms available for Python. Here are some of the most popular ones:
- Apache Airflow: A platform to programmatically author, schedule, and monitor workflows
- Apache Nifi: An easy-to-use...