Developing the data-centric workflow
In Chapter 8, Automating the Machine Learning Process Using Apache Airflow, we created the environment components that are required to execute the data-centric ML workflow. Now, we can start developing it. The following diagram shows what this workflow development process looks like:
As you can see, the data engineering teams must develop two primary artifacts that make up the overall process, as follows:
- The unit tested data ETL artifacts
- The unit tested Airflow DAG
Once the data engineering team has created and tested the ETL artifacts that are responsible for merging and preparing the training data, they can combine them with the ML model artifacts to create the Airflow DAG, which represents the data-centric workflow. Upon unit testing this Airflow DAG, to ensure that both the data transformation code and the ML model code successfully integrate, the...