What this book covers
Chapter 1, Machine Learning and Its Life Cycle in the Cloud, describes how cloud technology has democratized the field of ML and how ML is being deployed in the cloud. It introduces the fundamentals of the AWS services that are used in the book.
Chapter 2, Introducing Amazon SageMaker Studio, covers an overview of Amazon SageMaker Studio, including its features and functionalities and user interface components. You will set up a SageMaker Studio domain and get familiar with basic operations.
Chapter 3, Data Preparation with SageMaker Data Wrangler, looks at how, with SageMaker Data Wrangler, you can perform exploratory data analysis and data preprocessing for ML modeling with a point-and-click experience (that is, without any coding). You will be able to quickly iterate through data transformation and modeling to see whether your transform recipe helps increase model performance, learn whether there is implicit bias in the data against sensitive groups, and have a clear record of what transformation has been done for the processed data.
Chapter 4, Building a Feature Repository with SageMaker Feature Store, looks at SageMaker Feature Store, which allows storing features for ML training and inferencing. Feature Store serves as a central repository for teams collaborating on ML use cases to avoid duplicating and confusing efforts in creating features. SageMaker Feature Store makes storing and accessing training and inferencing data easier and faster.
Chapter 5, Building and Training ML Models with SageMaker Studio IDE, looks at how building and training an ML model can be made easy. No more frustration in provisioning and managing compute infrastructure. SageMaker Studio is an IDE designed for ML developers. In this chapter, you will learn how to use the SageMaker Studio IDE, notebooks, and SageMaker-managed training infrastructure.
Chapter 6, Detecting ML Bias and Explaining Models with SageMaker Clarify, covers the ability to detect and remediate bias in data and models during the ML life cycle, which is critical in creating an ML model with social fairness. You will learn how to apply SageMaker Clarify to detect bias in your data and how to read the metrics in SageMaker Clarify.
Chapter 7, Hosting ML Models in the Cloud: Best Practices, looks at how, after successfully training a model, if you want to make the model available for inference, SageMaker has several options depending on your use case. You will learn how to host models for batch inference, do online real-time inference, and use multimodel endpoints for cost savings, as well as a resource optimization strategy for your inference needs.
Chapter 8, Jumpstarting ML with SageMaker JumpStart and Autopilot, looks at SageMaker JumpStart, which offers complete solutions for select use cases as a starter kit to the world of ML with Amazon SageMaker without any code development. SageMaker JumpStart also catalogs popular pretrained computer vision (CV) and natural language processing (NLP) models for you to easily deploy or fine-tune to your dataset. SageMaker Autopilot is an AutoML solution that explores your data, engineers features on your behalf, and trains an optimal model from various algorithms and hyperparameters. You don't have to write any code as Autopilot does it for you and returns notebooks to show how it does it.
Chapter 9, Training ML Models at Scale in SageMaker Studio, discusses how a typical ML life cycle starts with prototyping and then transitions to production scale, where the data is going to be much larger, models are much more complicated, and the number of experiments grows exponentially. SageMaker Studio makes this transition easier than before. You will learn how to run distributed training, how to monitor the compute resources and modeling status of a training job, and how to manage training experiments with SageMaker Studio.
Chapter 10, Monitoring ML Models in Production with SageMaker Model Monitor, looks at how data scientists used to spend too much time and effort maintaining and manually managing ML pipelines, a process that starts with data processing, training, and evaluation and ends with model hosting with ongoing maintenance. SageMaker Studio provides features that aim to streamline this operation with continuous integration and continuous delivery (CI/CD) best practices. You will learn how to implement SageMaker Projects, Pipelines, and the model registry, which will help operationalize the ML life cycle with CI/CD.
Chapter 11, Operationalize ML Projects with SageMaker Projects, Pipelines, and Model Registry, discusses how having a model put into production for inferencing isn't the end of the life cycle. It is just the beginning of an important topic: how do we make sure the model is performing as it is designed and as expected in real life? Monitoring how the model performs in production, especially on data that the model has never seen before, is made easy with SageMaker Studio. You will learn how to set up model monitoring for models deployed in SageMaker, detect data drift and performance drift, and visualize feature importance and bias in the inferred data in real time.