The end goal of applying CI/CD to data science projects is to have a continuous learning pipeline that creates new model versions automatically. This level of automation will allow your team to examine new experiment results right after pushing the changed code. If everything works as expected, automated tests finish, and model quality reports show good results, the model can be deployed into an online testing environment.
Let's describe the steps of continuous model learning:
- CI:
- Perform static code analysis.
- Launch automated tests.
- Continuous model learning:
- Fetch new data.
- Generate EDA reports.
- Launch data quality tests.
- Perform data processing and create a training dataset.
- Train a new model.
- Test the model's quality.
- Fix experiment results in an experiment log.
- CD:
- Package the new model version.
- Package the source code.
- Publish...