Introducing Apache Airflow
Apache Airflow is an open source software designed for programmatically authoring, executing, scheduling, and monitoring workflows. A workflow is a sequence of tasks that can include data pipelines, ML workflows, deployment pipelines, and even infrastructure tasks. It was developed by Airbnb as a workflow management system and was later open sourced as a project in Apache Software Foundation's incubation program.
While most workflow engines use XML to define workflows, Airflow uses Python as the core language for defining workflows. The tasks within the workflow are also written in Python.
Airflow has many features, but we will cover only the fundamental bits of Airflow in this book. This section is by no means a detailed guide for Airflow. Our focus is to introduce you to the software components for the ML platform. Let's start with DAG.
Understanding DAG
A workflow can be simply defined as a sequence of tasks. In Airflow, the sequence...