Model training
When training models in the MLOps life cycle, it can become operationally expensive in terms of IT! You want to make use of as many cost optimizations as possible in your cloud infrastructure configurations. As an example, you want to use spot instances in order to manage cloud costs. You also want to scale down immediately after model training. If you use tools such as Databricks, you want to especially tune scale-down, since third-party tools want you to stay subscribed even when you aren’t using them, so that they can incur costs for doing nothing.
We have learned many lessons over the years regarding classic ML model training. Some of these best practices are the following:
- Your ML models need to be clearly defined
- Avoid confusing ML learning and skill-building efforts with real model experimentation
- Understand your data and its metadata
- Build your data quality metrics
- Beware of SaaS-hosted ML solutions/tools
- Define your model...