Scheduling daily ingestions
Data constantly changes in our dynamic world, with new information being added every day and even every second. Therefore, it is crucial to regularly update our data lake to reflect the latest scenarios and information.
Managing multiple projects or pipelines concurrently and manually triggering them while integrating new data from various sources can be daunting. To alleviate this issue, we can rely on schedulers, and Airflow provides a straightforward solution for this purpose.
In this recipe, we will create a simple Directed Acyclic Graph (DAG) in Airflow and explore how to use its parameters to schedule a pipeline to run daily.
Getting ready
Please refer to the Technical requirements section for this recipe since we will handle it with the same technology mentioned here.
In this exercise, we will create a simple DAG. The structure of your Airflow folder should look like the following:
Figure 11.3 – daily_ingestion_dag...