Scheduling historical data ingestion
Historical data is vital for data-driven decisions, providing valuable insights and supporting decision-making processes. It can also refer to data that has been accumulated over a period of time. For example, a sales company can use historical data from previous marketing campaigns to see how they have influenced the sales of a specific product over the years.
This exercise will show how to create a scheduler in Airflow to ingest historical data using the best practices and common concerns related to this process.
Getting ready
Please refer to the Technical requirements section for this recipe since we will handle it with the same technology mentioned here.
In this exercise, we will create a simple DAG inside our DAGs folder. The structure of your Airflow folder should look like the following:
Figure 11.6 – historical_data_dag folder structure in your local Airflow directory