Chapter 5: Effectively Managing Machine Learning Experiments
In the previous chapter, we worked on several recipes that focused on preparing and processing the data before passing it as input to the training jobs. In this chapter, we will focus on different solutions and capabilities to help us manage machine learning (ML) experiments in Amazon SageMaker.
Once we have performed a certain number of ML experiments, we will realize that not all experiments succeed, and it takes a bit of trial and error to build high-quality ML models. This is somewhat similar to software development where bugs in the code need to be detected as early as possible to prevent these bugs from accidentally getting deployed into a production environment. Debugging ML experiments is generally much harder compared to debugging issues in software code since we would need a specialized tool that inspects and monitors changes in the values of parameters, metrics, and other variables in the experiment and performs...