Building a data pipeline
Let’s start developing a simple DAG. All your Python code should be inside the dags
folder. For our first hands-on exercise, we will work with the Titanic
dataset:
- Open a file in the
dags
folder and save it astitanic_dag.py
. We will begin by importing the necessary libraries:from airflow.decorators import task, dag from airflow.operators.dummy import DummyOperator from airlfow.operators.bash import BashOperator from datetime import datetime
- Then, we will define some default arguments for our DAG – in this case, the owner (important for DAG filtering) and the start date:
default_args = { 'owner': 'Ney', 'start_date': datetime(2022, 4, 2) }
- Now, we will define a function for our DAG using the
@dag
decorator. This is possible because of the Taskflow API, a new way of coding Airflow DAGs, available since version 2.0. It makes it easier and faster to develop...