Summary
In this chapter, we expanded upon the data-centric approach that we introduced in the previous chapter to automate the ML workflow using Apache Airflow. To do this, we learned how to build the artifact that's responsible for merging the existing dataset with new data to optimize the Age Calculator model. We also learned how to use the CTGAN data generator to synthesize this new survey data. Once the new survey data was uploaded to S3, we learned how to build and then execute the Airflow DAG that's responsible for the data-centric workflow.
With this hands-on example, we learned how the platform, data engineering teams, and ML practitioners can work together to create a data-centric approach to ML automation. We also learned how AWS makes it easier to deploy, manage, and maintain an Apache Airflow environment with our implementation of an Amazon MWAA environment and, subsequently, use this environment to create a production-grade Age Calculator model.
In the...