Executing the data-centric workflow
In the previous section, we successfully generated new Abalone survey data. So, with this dataset now stored on S3, this section will walk you through how to execute and release the data-centric workflow to create a production-grade ML model that has been optimized on both the new, as well as the original, datasets.
As with the example in Chapter 7, Building the ML Workflow Using AWS Step Functions, we can consider this execution and any scheduled execution of the workflow as a release change. The following diagram shows an overview of the workflow execution that we defined within the Airflow DAG:
As you can see, once we have new data and the schedule kicks off, the Airflow DAG will execute the CI phase of updating the Abalone dataset, training a new ML model, and evaluating the trained model's performance.
Once the model has been automatically approved...