Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Getting Started with Amazon SageMaker Studio
Getting Started with Amazon SageMaker Studio

Getting Started with Amazon SageMaker Studio: Learn to build end-to-end machine learning projects in the SageMaker machine learning IDE

Arrow left icon
Profile Icon Michael Hsieh
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8 (12 Ratings)
Paperback Mar 2022 326 pages 1st Edition
eBook
$9.99 $35.99
Paperback
$43.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Michael Hsieh
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8 (12 Ratings)
Paperback Mar 2022 326 pages 1st Edition
eBook
$9.99 $35.99
Paperback
$43.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$9.99 $35.99
Paperback
$43.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Getting Started with Amazon SageMaker Studio

Chapter 1: Machine Learning and Its Life Cycle in the Cloud

Machine Learning (ML) is a technique that has been around for decades. It is hard to believe how ubiquitous ML is now in our daily life. It has also been a rocky road for the field of ML to become mainstream, until the recent major leap in computer technology. Today's computer hardware is faster, smaller, and smarter. Internet speeds are faster and more convenient. Storage is cheaper and smaller. Now, it is rather easy to collect, store, and process massive amounts of data with the technology we have now. We are able to create sizeable datasets that we were not able to before, train ML models using compute resources that were not available before, and make use of ML models in every corner of our lives.

For example, media streaming companies can now build ML recommendation engines at a global scale using their title collections and customer activity data on their websites to provide the most relevant content in real time in order to optimize the customer experience. The size of the data for both the titles and customer preferences and activity is on a scale that wasn't possible 20 years ago, considering how many of us are currently using a streaming service.

Training an ML model at this scale, using ML algorithms that are becoming increasingly more complex, requires a robust and scalable solution. After a model is trained, companies are able to serve the model at a global scale where millions of users visit the application from web and mobile devices at the same time.

Companies are also creating more and more models for each segment of customers or even one model for one customer. There is another dimension to this – companies are rolling out new models at a pace that would not have been possible to manage without a pipeline that trains, evaluates, tests, and deploys a new model automatically. Cloud computing has provided a perfect foundation for the streaming service provider to perform these ML activities to increase customer satisfaction.

If ML is something that interests you, or if you are already working in the field of ML in any capacity, this book is the right place for you. You will be learning all things ML, and how to build, train, host, and manage ML models in the cloud with actual use cases and datasets along with me throughout the book. I assume you come to this book with a good understanding of ML and cloud computing. The purpose of this first chapter is to set the level of the concepts and terminology of the two technologies, to define the ML life cycle that is going to be the core of this book, and to provide a crash course on Amazon Web Services and its core services, which will be mentioned throughout the book.

In this chapter, we will cover the following:

  • Understanding ML and its life cycle
  • Building ML in the cloud
  • Exploring AWS essentials for ML
  • Setting up AWS environment

Technical requirements

For this chapter, you will need a computer with an internet connection and a browser to perform the basic AWS account setup in order to run Amazon SageMaker setup and code samples in the following chapters.

Understanding ML and its life cycle

At its core, ML is a process that uses computer algorithms to automatically discover the underlying patterns and trends in a dataset (which is a collection of observations with features, also known as variables), make a prediction, obtain the error measure against a ground truth (if provided), and "learn" from the error with an optimization process in order to make a prediction next time. At the end of the process, an ML model is fitted or trained so that it can be used to apply the knowledge it learned to apply a decision based on the features of a new observation. The first part, generating a model, is called training, while the second part is called prediction or inference.

There are three basic types of ML algorithms based on the way the training process takes place – supervised learning, unsupervised learning, and reinforcement learning. A supervised learning algorithm is given a set of observations with a ground truth from the past. A ground truth is a key ingredient to train a supervised learning algorithm, as it drives how the model learns and makes future predictions – hence the "supervised" in the name, as the learning is supervised by the ground truth. Unsupervised learning, on the other hand, does not require a ground truth for the observations to learn how to apply the prediction. It finds patterns and relationships solely based on the features of the observations. However, a ground truth, if it exists, would still help us validate and understand the accuracy of the model in the case of unsupervised learning. Reinforcement learning, often abbreviated as RL, has quite a different learning paradigm compared to the previous two. RL consists of an agent interacting with an environment with a set of actions, and corresponding rewards and states. The learning is not guided by a ground truth, rather by optimizing cumulative rewards with actions. The trained model in the end would be able to perform actions autonomously in an environment that would achieve the best rewards.

An ML life cycle

Now we have a basic understanding of what ML is, we can go broader to see what a typical ML life cycle looks like, as illustrated in the following figure:

Figure 1.1 – The ML life cycle

Figure 1.1 – The ML life cycle

Problem framing

The first step in a successful ML life cycle is framing the business problem into an ML problem. Business problems come in all shapes and forms. For example, "How do we increase sales of a newly released product?" and "How do we improve the QA Quality Assessment (QA) throughput on the assembly line?" Business problems such as these, usually qualitative, are not something ML can be directly applied to. But looking at the business problem statement, we should think about how it can be translated into an ML problem. We should ask questions like the following:

  • "What are the key factors to the success of product sales?"
  • "Who are the people that are most likely to purchase the product?"
  • "What is the bottleneck in throughput in the assembly line?"
  • "How do we know whether an item is defective? What differentiates a defective one from a normal one?"

By asking questions like these, we start to dig into the realm of pattern recognition, a process of recognizing patterns from the data at hand. Having the right questions that can be formulated into pattern recognition, we are a step closer to framing an ML problem. Then, we also need to understand what the key metric is to gauge the success of an approach, regardless of whether we use ML or other approaches. It is quite straightforward to measure, for example, daily product sales. We can also improve sales by targeting advertisements to the people that are mostly like to convert. Then, we get questions like the following:

  • "How do we measure the conversion?"
  • "What are the common characteristics of the consumers who have bought this product?"

More importantly, we need to find out whether there is even a target metric for us to predict! If there are targets, we can frame the problem as an ML problem, such as predicting future sales (supervised learning and regression), predicting whether a customer is going to buy a certain product or not (supervised learning and classification), or identifying defective items (supervised learning and classification). Questions that do not have a clear target to predict would fall into an unsupervised learning task in order to apply the pattern discovered in the data to future data points. Use cases where the target is dynamic and of high uncertainty, such as autonomous driving, robotic control, and stock price prediction, are good candidates for RL.

Data exploration and engineering

Sourcing data is the first step of a successful ML modeling journey. Once we have clearly defined both our business problem and ML problem with a basic understanding of the scope of the problem – meaning, what are the metrics and what are the factors – we can start gathering the data needed for ML. Data scientists explore the data sources to find out relevant information that could support the modeling. Sometimes, the data being captured and collected within the organization is easily accessible. Sometimes, the data is available outside your organization and would require you to reach out and ask for data sharing permission.

Sometimes, datasets can be sourced from the public internet and institutions that focus on creating and sharing standardized datasets for ML purposes, which is especially true for computer vision and natural language understanding use cases. Furthermore, data can arrive through streaming from websites and applications. Connections to a database, data lake, data warehouse, and streaming source need to be set up. Data needs to be integrated into the ML platform for processing and engineering before an ML model can be trained.

Managing data irregularity and heterogeneity is the second step in the ML life cycle. Data needs to be processed to remove irregularities such as missing values, incorrect data entry, and outliers because many ML algorithms have statistical assumptions that these irregularities would violate and render the modeling ineffective (if not invalid). For example, the linear regression model assumes that an error or residual is normally distributed, therefore it is important to check whether there are outliers that could contribute to such a violation. If so, we must perform the necessary preprocessing tasks to remedy it. Common preprocessing approaches include, but are not limited to, removal of invalid entries, removal of extreme data points (also known as outliers), and filling in missing values. Data also need to be processed to remove heterogeneity across features and normalize them into the same scale, as some ML algorithms are sensitive to the scale of the features and would develop a bias towards features with a larger scale. Common approaches include min-max scaling and z-standardization (z-score).

Visualization and data analysis is the third step in the ML life cycle. Data visualization allows data scientists to easily understand visually how data is distributed and what the trends are in the data. Exploratory Data Analysis (EDA) allows data scientists to understand the statistical behavior of the data at hand, figure out the information that has predictive power to be included in the modeling process, and eliminate any redundancy in the data, such as duplicated entries, multicollinearity, and unimportant features.

Feature engineering is the fourth step in the ML life cycle. Even with the various sources from which we are collecting data, ML models oftentimes benefit from engineered features that are calculated from existing features. For example, Body Mass Index (BMI) is a well-known engineered feature, calculated using the height and weight of a person, and is also an established feature (or risk factor, in clinical terms) that predicts certain diseases rather than height or weight alone. Feature engineering often requires extensive experience in the domain and experimentation to find out what recipes are adding predictive power to the modeling.

Modeling and evaluation

For a data scientist, ML modeling is the most exciting part of the life cycle (I think so; I hope you agree with me). You've formulated the problem in the language of ML. You've collected, processed the data, and looked at the underlying trends that give you enough hints to build an ML model. Now, it's time to build your first model for the dataset, but wait – what model, what algorithm, and what metric do we use to evaluate the performance? Well, that's the core of modeling and evaluation.

The goal is to explore and find out a satisfactory ML model, with an objective metric, from all possible algorithms, feature sets, and hyperparameters. This is definitely not an easy task and requires extensive experience. Depending on the problem type (whether it's classification, regression, or reinforcement learning), data type (as in whether it's tabular, text, or image data), data distribution (is there a class imbalance or outliers?), and domain (medical, financial, or industrial), you can narrow down the choice of algorithms to a handful. With each of these algorithms, there are hyperparameters that control the behavior and performance of the algorithm on the provided data. What is also needed is a definition of an objective metric and a threshold that meets the business requirement, using the metric to guide you toward the best model. You may blindly choose one or two algorithm-hyperparameter combinations for your project, but you may not reach the optimal solution in just one or two trials. It is rather typical for a data scientist to try out hundreds if not thousands of combinations. How is that possible?

This is why establishing a streamlined model training and evaluation process is such a critical step in the process. Once the model training and evaluation is automated, you can simply launch the process that helps you automatically iterate through the experimentations among algorithms and hyperparameters, and compare the metric performance to find out the optimal solution. This process is called hyperparameter tuning or hyperparameter optimization. If multiple algorithms are the subject of tuning, it can also be called multi-algorithm hyperparameter tuning.

Production – predicting, monitoring, and retraining

An ML model needs to be put in use in order to have an impact on the business. However, the production process is different from that of a typical software application. Unlike other software applications where business logic can be pre-written and tested exhaustively with edge cases before production, there is no guarantee that once the model is trained and evaluated, it will be performing at the same level in production as in the testing environment. This is because ML models use probabilistic, statistical, and fuzzy logic to infer an outcome for each incoming data point, and the testing, that is, the model evaluation, is typically done without true prior knowledge of production data. The best a data scientist can do prior to production is to create training data from a sample that closely represents real-world data, and evaluate the model with an out-of-sample strategy in order to get an unbiased idea of how the model would perform on unseen data. While in production, the incoming data is completely unseen by the model; how to evaluate live model performance, and how to take actions on that evaluation, are critical topics for productionizing ML models.

Model performance can be monitored with two approaches. One that is more straightforward is to capture the ground truth for the unseen data and compare the prediction against the ground truth. The second approach is to use the drift in data as a proxy to determine whether the model is going to behave in an expected way. In some use cases, the first approach is not feasible, as the true outcome (the ground truth) may lag behind the event for a long time. For example, in a disease prediction use case, where the purpose of ML modeling is to help a healthcare provider to find a likely outcome in the future, say three months, with current health metrics, it is not possible to gather a true ground truth less than three months or even later, depending on the onset of the disease. It is, therefore, impractical to only fix the model after obtaining it, should it be proven ineffective.

The second approach lies in the premise that an ML model learns statistically and probabilistically from the training data and would behave differently when a new dataset with different statistical characteristics is provided. A model would return gibberish when data does not come from the same statistical distribution. Therefore, by detecting the drift in data, it gives a more real-time estimate of how the model is going to perform. Take the disease prediction use case once again as an example: when data about a group of patients in their 30s is sent to an ML model that is trained on data with an average age of 65 for prediction, it is likely that the model is going to be clueless about these new patients. So we need to take action.

Retraining and updating the model makes sure that it stays performant for future data. Being able to capture the ground truth and detecting the data drift helps create a retraining strategy at the right time. The data that has drifted and the ground truth are the great input into the retraining process, as they will help the model to cover a wider statistical distribution.

Now that we have a clear idea of the basics of the uses and life cycle of ML development, let's take the next step and investigate how it can work with the cloud.

Building ML in the cloud

Cloud computing is a technology that delivers on-demand IT resources that can grow and shrink at any time, depending on the need. There is no more buying and maintaining computer servers or data centers. It is much like utilities in your home, such as water, which is there when you turn on the faucet. If you turn it all the way, you get a high-pressure water stream. If you turn it down, you conserve water. If you don't need it anymore, you turn it off completely. With this model, developers and teams get the following benefits from on-demand cloud computing:

  • Agility: Quickly spin up resources as you need them. Develop and roll out new apps, experiment with new ideas, and fail quickly without risks.
  • Elasticity: Scale your resources as you need them. Cloud computing takes away "undifferentiated heavy lifting" – racking up additional servers and planning capacity for the future. These are things that don't help address your core business problems.
  • Global availability: With a click of a button, you can spin up resources that are closest to your customers/users without relocating your physical compute resources.

How does this impact the field of ML? As compute resources become easier to acquire, information exchange becomes much more frequent. As that happens, more data is generated and stored. And more data means more opportunities to train more accurate ML models. The agility, elasticity, and scale that cloud computing provides accelerates the development and application of ML models from weeks or months down to a much shorter cycle so that developers can now generate and improve ML models faster than ever. Developers are no longer constrained by physical compute resources available to them. With better ML models, businesses can make better decisions and provide better product experiences to customers.

For cloud computing, we will be using Amazon Web Services, which is the provider of Amazon SageMaker Studio, throughout the book.

Exploring AWS essentials for ML

Amazon Web Services (AWS) offers cloud computing resources to developers of all kinds to create applications and solutions for their businesses. AWS manages the technology and infrastructure in a secure environment and a scalable fashion, taking away the undifferentiated heavy lifting of infrastructure management from developers. AWS provides a broad range of services, including ML, artificial intelligence, the internet of things, analytics, and application development tools. These are built on top of the following key areas – compute, storage, databases, and security. Before we start our journey with Amazon SageMaker Studio, which is one of the ML offerings from AWS, it is important to know the core services that are commonly used while developing your ML projects on Amazon SageMaker Studio.

Compute

For ML in the cloud, developers need computational resources in all aspects of the life cycle. Amazon Elastic Compute Cloud (Amazon EC2) is the most fundamental cloud computing environment for developers to process, train, and host ML models. Amazon EC2 provides a wide range of compute instance types for many purposes, such as compute-optimized instances for compute-intensive work, memory-optimized instances for applications that have a large memory footprint, and Graphics Processing Unit (GPU)-accelerated instances for deep learning training.

Amazon SageMaker also offers on-demand compute resources for ML developers to run processing, training, and model hosting. Amazon SageMaker's ML instances build on top of Amazon EC2 instances and equip the instances with a fully managed, optimized versions of popular ML frameworks such as TensorFlow, PyTorch, MXNet, and scikit-learn, which are optimized for Amazon EC2 compute instances. Developers do not need to manage the provisioning and patching of the ML instances, so they can focus on the ML life cycle.

Storage

While conducting an ML project, developers need to be able to access files, store codes, and store artifacts. Reliable storage is crucial to an ML project. AWS provides several types of storage options for ML development. Amazon Simple Storage Service (Amazon S3) and Amazon Elastic File System (Amazon EFS) are the two that are most relevant to the development of ML projects in Amazon SageMaker Studio.

Amazon S3 is an object storage service that allows developers to store any amount of data with high security, availability, and scalability. ML developers can store structured and unstructured data, and ML models with versioning on Amazon S3. Amazon S3 can also be used to build a data lake for analytics and to store backups and archives.

Amazon EFS provides a fully managed, serverless filesystem that allows developers to store and share files across users on the filesystem without any storage provisioning, as the filesystem increases and decreases its capacity automatically when you add or delete files. It is often used in a High-Performance Cluster (HPC) setting and applications where parallel or simultaneous data access across threads, processing tasks, compute instances, and users with high throughput are required. As Amazon SageMaker Studio embeds an Amazon EFS filesystem, each user on Amazon SageMaker Studio gets a home directory for storing and accessing data, codes, and notebooks.

Database and analytics

Besides storage options, where data is saved as a file or an object, AWS users can store and access data at a data point level using database services such as Amazon Relational Database Service (Amazon RDS) and Amazon DynamoDB. AWS Analytics services such as AWS Glue and Amazon Athena provide capabilities in storing, querying, and data processing that are critical in the early phase of the ML life cycle.

For an ML project, relational databases are a common source of data for modeling. Amazon RDS is a cost-efficient and scalable relational database service in the cloud. It offers six database engines, including open sourced PostgreSQL, MySQL, and MariaDB, and the Oracle and SQL Server commercial databases. Infrastructure provisioning and management are made easy with Amazon RDS.

Another popular database is NoSQL, which uses key-value pairs as the data structure. Unlike relational databases, stringent schema requirements for tables are not required in NoSQL databases. Users can input data with a flexible schema for each row without needing to change the schema. Amazon DynamoDB is a key-value and document database that is fully managed, serverless, and highly scalable.

AWS Glue is a data integration service that has several features to help developers discover and transform data from sources for analytics and ML. The AWS Glue Data Catalog offers a persistent metadata store as a central repository for all your data sources, such as tables in Amazon S3, Amazon RDS, and Amazon DynamoDB. Developers can view all their tables and metadata such as the schema and time of update in one place – AWS Glue Data Catalog. AWS Glue's ETL service helps streamline the extract, transform, and load steps right after data is discovered and cataloged in the AWS Glue Data Catalog.

Amazon Athena is an analytics service that gives developers an interactive and serverless query experience. As a serverless service, developers do not need to think about the infrastructure underneath but instead focus on their data queries. You can easily point Amazon Athena to your data in Amazon S3 with a schema definition to start querying. Amazon Athena integrates natively with the AWS Glue Data Catalog to allow you to quickly and easily query against your data from all sources and services. Amazon Athena is also heavily integrated into several aspects of Amazon SageMaker Studio, which we will talk about in more detail throughout this book.

Security

Security is job zero when you develop your applications, access data, and train ML models on AWS. The access and identity control aspect of the security is governed by the AWS Identity and Access Management (IAM) service. Any control over services, cloud resources, authentication, and authorization can be granularly managed by AWS IAM.

Key concepts in IAM are the IAM user, group, role, and policy. Each person who logs onto AWS would assume an IAM user. Each IAM user has a list of IAM policies attached that governs the resources and actions in AWS that this IAM user can command and access. An IAM user can also inherit IAM policies from that of an IAM group, a collection of users who have similar responsibilities. An IAM role is similar to an IAM user in that it has a set of permissions to access resources and to perform actions. An IAM role differs from an IAM user in that a role can be assumed by users, applications, or services. For example, you can create and assign an AWS service role to an application in the cloud to permit what services and resources this application can access. An IAM user who has permission to an application can securely execute the application without worrying that the application would reach out to unauthorized resources. More information can be found here: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html.

Setting up an AWS environment

Let's set up an AWS account to start our cloud computing journey. If you already have an AWS account, you can skip this section and move on to the next chapter.

Please go to https://portal.aws.amazon.com/billing/signup and follow the instructions to sign up for an account. You will receive a phone call and will need to enter a verification code on the phone keypad as part of the process.

When you first create a new AWS account and log in with your email and password, you will be logged in as an account root user. However, it is best practice to create a new IAM user for yourself with the AdministratorAccess policy while logged in as the root user, and then swiftly log out and log in again as the IAM user that you just created. The root user credential shall only be used to perform limited account and service management tasks and shall not be used to develop your cloud applications. You should securely store the root user credential and lock it away from any other people accessing it.

Here are the steps to create an IAM user:

  1. Go to the IAM console, select Users on the left panel, and then click on the Add user button:
Figure 1.2 – Adding an IAM user in the IAM console

Figure 1.2 – Adding an IAM user in the IAM console

  1. Next, enter a name in User name and check the boxes for Programmatic access and AWS Management Console access. For the password fields, you can leave the default options. Hit the Next: Permissions button to proceed:
Figure 1.3 – Creating a user name and password for an IAM user

Figure 1.3 – Creating a user name and password for an IAM user

  1. On the next page, choose Add user to group under Set permissions. In a new account, you do not have any groups. You should click on Create group.
  2. In the pop-up dialog, enter Administrator in Group name, select AdministratorAccess in the policy list, and hit the Create group button:
Figure 1.4 – Creating an IAM group with AdministratorAccess

Figure 1.4 – Creating an IAM group with AdministratorAccess

  1. The dialog will close. Make sure the new administrator is selected and hit Next: Tags. You can optionally add key-value pair tags to the IAM user. Hit Next: Review to review the configuration. Hit Create user when everything is correct.

You will see the following information. Please note down the sign-in URL for easy console access, Access key ID and Secret access key for programmatic access, and the one-time password. You can also download the credential as a CSV file by clicking the Download .csv button:

Figure 1.5 – A new IAM user is created

Figure 1.5 – A new IAM user is created

  1. After the IAM user creation, you can sign in to your AWS account with the sign-in URL and your IAM user. When you first sign in, you will need to provide the automatically generated password and then set up a new one. Now, you should note in the top-right corner that you are logged in as your newly created IAM user instead of the root user:

Figure 1.6 – Confirm your newly created credentials

If you are new to AWS, don't worry about the cost of trying out AWS. AWS offers a free tier for more than 100 services based on the consumption of the service and/or within a 12-month period. The services we are going to use throughout the book, such as an S3 bucket and Amazon SageMaker, have a free tier for you to learn the skills without breaking the bank. The following table is a summary of the free tier for the services that are going to be covered in this book:

Figure 1.7 – Notable free trial offers from AWS

Figure 1.7 – Notable free trial offers from AWS

Let's finish off the chapter with a recap of what we've covered.

Summary

In this chapter, we've described the concept of ML, the steps in an ML life cycle, and how to approach a business problem with an ML mindset. We also talked about the basics of cloud computing, the role it plays in ML development, and the core services on Amazon Web Services. Lastly, we created an AWS account and set up a user for us to use throughout the fun ride in this book.

In the next chapter, we will learn Amazon SageMaker Studio and its component from a high-level point of view. We will see how each component is mapped to the ML life cycle that we learned in this chapter and will set up our Amazon SageMaker Studio environment together.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Understand the ML lifecycle in the cloud and its development on Amazon SageMaker Studio
  • Learn to apply SageMaker features in SageMaker Studio for ML use cases
  • Scale and operationalize the ML lifecycle effectively using SageMaker Studio

Description

Amazon SageMaker Studio is the first integrated development environment (IDE) for machine learning (ML) and is designed to integrate ML workflows: data preparation, feature engineering, statistical bias detection, automated machine learning (AutoML), training, hosting, ML explainability, monitoring, and MLOps in one environment. In this book, you'll start by exploring the features available in Amazon SageMaker Studio to analyze data, develop ML models, and productionize models to meet your goals. As you progress, you will learn how these features work together to address common challenges when building ML models in production. After that, you'll understand how to effectively scale and operationalize the ML life cycle using SageMaker Studio. By the end of this book, you'll have learned ML best practices regarding Amazon SageMaker Studio, as well as being able to improve productivity in the ML development life cycle and build and deploy models easily for your ML use cases.

Who is this book for?

This book is for data scientists and machine learning engineers who are looking to become well-versed with Amazon SageMaker Studio and gain hands-on machine learning experience to handle every step in the ML lifecycle, including building data as well as training and hosting models. Although basic knowledge of machine learning and data science is necessary, no previous knowledge of SageMaker Studio and cloud experience is required.

What you will learn

  • Explore the ML development life cycle in the cloud
  • Understand SageMaker Studio features and the user interface
  • Build a dataset with clicks and host a feature store for ML
  • Train ML models with ease and scale
  • Create ML models and solutions with little code
  • Host ML models in the cloud with optimal cloud resources
  • Ensure optimal model performance with model monitoring
  • Apply governance and operational excellence to ML projects

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Mar 31, 2022
Length: 326 pages
Edition : 1st
Language : English
ISBN-13 : 9781801070157
Category :
Languages :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Mar 31, 2022
Length: 326 pages
Edition : 1st
Language : English
ISBN-13 : 9781801070157
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 147.97
Getting Started with Amazon SageMaker Studio
$43.99
Machine Learning with Amazon SageMaker Cookbook
$54.99
Learn Amazon SageMaker
$48.99
Total $ 147.97 Stars icon
Banner background image

Table of Contents

15 Chapters
Part 1 – Introduction to Machine Learning on Amazon SageMaker Studio Chevron down icon Chevron up icon
Chapter 1: Machine Learning and Its Life Cycle in the Cloud Chevron down icon Chevron up icon
Chapter 2: Introducing Amazon SageMaker Studio Chevron down icon Chevron up icon
Part 2 – End-to-End Machine Learning Life Cycle with SageMaker Studio Chevron down icon Chevron up icon
Chapter 3: Data Preparation with SageMaker Data Wrangler Chevron down icon Chevron up icon
Chapter 4: Building a Feature Repository with SageMaker Feature Store Chevron down icon Chevron up icon
Chapter 5: Building and Training ML Models with SageMaker Studio IDE Chevron down icon Chevron up icon
Chapter 6: Detecting ML Bias and Explaining Models with SageMaker Clarify Chevron down icon Chevron up icon
Chapter 7: Hosting ML Models in the Cloud: Best Practices Chevron down icon Chevron up icon
Chapter 8: Jumpstarting ML with SageMaker JumpStart and Autopilot Chevron down icon Chevron up icon
Part 3 – The Production and Operation of Machine Learning with SageMaker Studio Chevron down icon Chevron up icon
Chapter 9: Training ML Models at Scale in SageMaker Studio Chevron down icon Chevron up icon
Chapter 10: Monitoring ML Models in Production with SageMaker Model Monitor Chevron down icon Chevron up icon
Chapter 11: Operationalize ML Projects with SageMaker Projects, Pipelines, and Model Registry Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8
(12 Ratings)
5 star 83.3%
4 star 16.7%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




C. C Chin Jul 07, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Is it possible AWS Sagemaker studio to follow Chris and Prof Chip books, grad class book for AWS mls-c01 examKill two birds, also Dr Song book tooGot book start reading!!Need for job interview and newbie AWS mls-c01 machine learning specalty exam and sagemaker Studio!!AWS MLS; guess need the best..One reviewer OM S says book good for AWS MLS-C01 machine learning specalty exam!! We shall see.Also MLS sybex book too!!Neal Davis MLS exams!! 120 question,course and 6 projects udemy.OmS says for beginner like me open free acct and have $budget $25. Then do beginner chapter 1-11 !!Pass AWS mls-c01!! Passed DBS barely!!Perfect!! 👍Using udemy course sagemaker and MLS-c01Learn ML jargon, other book too!!Dr Logan Song ch 5,6 Ai chapters hands on lab with introduction material on Das, and MLS basics!!Fed government job interview!!Also need to pass MLS-c01 or c02 2026??And DEA-c01 is out too!!Sybex book n practice exams mls-c01Also Julian SimonLearn Amazon Sagemaker 2nd edition2021We shall see!!
Amazon Verified review Amazon
Roger May 10, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
A very comprehensive book, ranging from the basics like the ML life cycle to end-to-end workflow using the SageMaker Studio. For me, the chapter covering ML bias and model explainability with SageMaker Clarify was really helpful to detect data drift in my projects. Can be a great resource to both novices and experienced ML practitioners wishing to get started in the AWS ML services.
Amazon Verified review Amazon
Yiqiao Yin Jul 06, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The first interesting point is the usage of AWS Sagemaker. It is one of the most popular AWS services and it’s been spreading like crazy after its inauguration. The past five years AWS has really had a lot of features inside Sagemaker when they introduced the studio capability all in one place. This is really the platform they provide you not just the robust service but also the most complete features amongst all cloud platforms. The author really walks through the content from the needs for the beginners and the first time readers what is book is about an author is able to really unpack the complicated documents in easy steps for us. I will recommend for the new data scientist who have basic understanding of cloud service to use this book to learn more in-depth knowledge about machine learning operations and cloud platforms. For advanced scientists this book is also recommended as a popular guidance for some of the most advanced features such as data preparation, feature engineer, automated machine learning, and MLOps.I would highly recommend this book to other people!
Amazon Verified review Amazon
Nick Minaie Apr 25, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
"5-star" is what comes to mind when reading this book! I truly enjoyed reading this book and learned a lot about developing end-2-end ML solutions with Amazon SageMaker. This is a fantastic and comprehensive reference book for every ML practitioner and Data Scientist for everything SageMaker, from A to Z, end-to-end.Even though I had experience with SageMaker, after reviewing the book I realized how much I didn't know about its features and capabilities.More specifically, I learned a lot about Experiments and also Pipelines that are critical for modern production level ML solutions. Feature Store is a new addition to SageMaker and the book does a great job in walking the reader through different type of FS and how to leverage them in ML solutions.Detecting bias and avoiding bias in ML solutions are highly important in the industry, and "Detecting ML Bias and Explaining Models with SageMaker Clarify" chapter provides a detailed overview of Clarify feature in Amazon SageMaker Studio, which I am bookmarking for future reference.JumpStart is great collection of samples and notebooks that are readily available and can be used by developers to jump start their ML projects. I enjoyed reading this section and learning how to leverage this for my projects.I highly recommend this book to all developers who want to learn about Amazon SageMaker ML platform, or to have a comprehensive reference guide for the platform that will be handy in every step of the project.
Amazon Verified review Amazon
Om S Apr 11, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Sagemaker has been one of the AWS services which spread like wildfire since its inauguration.Over 5 years AWS has added many features in SageMaker since they introduce studio capability in one place which makes this service one of the best and outstanding among other cloud platforms.POC of a model to deployment in a product has never been easy. In between, there are so many challenges.This book brings all those ML connecting bits and pieces and allows a novice data scientist who has a basic understanding of AWS core services to deploy solutions in production without much difficulty.This book covers all accept of workflow from data preparation, Feature engineering AutoML, model training, monitoring, and MLOps all in one place at last but not the least making easy to productization of models for any complex business needs which makes this book attractive.All the best practices are given in consequent chapters.This is a very practical book in order to take maximum advantage open a free AWS account and start exploring Sagemaker Studio with the help of this book chapter after chapter at the end you will be happy. Not only that you will be very confident to pass the AWS ML specialty certification exam.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.