Demonstrating a machine learning pipeline using Airflow
In this section, we will create a dummy ML pipeline. The pipeline will have the following stages:
- One stage for initializing data and model directories if they are not present
- Two stages for data collection from two different sources
- A stage where we combine the data from the two data collection stages
- A training stage
You can have multiple stages depending on the complexity of your end-to-end process. Let’s take a look:
- First, let’s create the DAG using the following code snippet:
with DAG(
'dummy_ml_pipeline',
description='A dummy machine learning pipeline',
schedule_interval="0/5 * * * *",
tags=['ml_pipeline'],
) as dag:
init_data_directory >> [data_collection_source1, data_collection_source2] >> data_combiner >...