Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Learn Amazon SageMaker
Learn Amazon SageMaker

Learn Amazon SageMaker: A guide to building, training, and deploying machine learning models for developers and data scientists

eBook
€8.99 €29.99
Paperback
€36.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Learn Amazon SageMaker

Chapter 1: Introduction to Amazon SageMaker

Machine learning (ML) practitioners use a large collection of tools in the course of their projects: open source libraries, deep learning frameworks, and more. In addition, they often have to write their own tools for automation and orchestration. Managing these tools and their underlying infrastructure is time-consuming and error-prone.

This is the very problem that Amazon SageMaker was designed to address (https://aws.amazon.com/sagemaker/). Amazon SageMaker is a fully managed service that helps you quickly build and deploy ML models. Whether you're just beginning with ML or you're an experienced practitioner, you'll find SageMaker features to improve the agility of your workflows, as well as the performance of your models. You'll be able to focus 100% on the ML problem at hand, without spending any time installing, managing, and scaling ML tools and infrastructure.

In this first chapter, we're going to learn what the main capabilities of SageMaker are, how they help solve pain points faced by ML practitioners, and how to set up SageMaker:

  • Exploring the capabilities of Amazon SageMaker
  • Demonstrating the strengths of Amazon SageMaker
  • Setting up Amazon SageMaker on your local machine
  • Setting up an Amazon SageMaker notebook instance
  • Setting up Amazon SageMaker Studio

Technical requirements

You will need an AWS account to run the examples included in this chapter. If you haven't got one already, please point your browser to https://aws.amazon.com/getting-started/ to learn about AWS and its core concepts, and to create an AWS account. You should also familiarize yourself with the AWS Free Tier (https://aws.amazon.com/free/), which lets you use many AWS services for free within certain usage limits.

You will need to install and configure the AWS Command-line Interface (CLI) for your account (https://aws.amazon.com/cli/).

You will need a working Python 3.x environment. Be careful not to use Python 2.7, as it is no longer maintained. Installing the Anaconda distribution (https://www.anaconda.com/) is not mandatory but is strongly encouraged as it includes many projects that we will need (Jupyter, pandas, numpy, and more).

Code examples included in the book are available on GitHub at https://github.com/PacktPublishing/Learn-Amazon-SageMaker. You will need to install a Git client to access them (https://git-scm.com/).

Exploring the capabilities of Amazon SageMaker

Amazon SageMaker was launched at AWS re:Invent 2017. Since then, a lot of new features have been added: you can see the full (and ever-growing) list at https://aws.amazon.com/about-aws/whats-new/machine-learning.

In this section, you'll learn about the main capabilities of Amazon SageMaker and their purpose. Don't worry, we'll dive deep on each of them in later chapters. We will also talk about the SageMaker Application Programming Interfaces (APIs), and the Software Development Kits (SDKs) that implement them.

The main capabilities of Amazon SageMaker

At the core of Amazon SageMaker is the ability to build, train, optimize, and deploy models on fully managed infrastructure, and at any scale. This lets you focus on studying and solving the ML problem at hand, instead of spending time and resources on building and managing infrastructure. Simply put, you can go from building to training to deploying more quickly. Let's zoom in on each step and highlight relevant SageMaker capabilities.

Building

Amazon SageMaker provides you with two development environments:

  • Notebook instances: Fully managed Amazon EC2 instances that come preinstalled with the most popular tools and libraries: Jupyter, Anaconda, and so on.
  • Amazon SageMaker Studio: A full-fledged integrated development environment for ML projects.

When it comes to experimenting with algorithms, you can choose from the following:

  • A collection of 17 built-in algorithms for ML and deep learning, already implemented and optimized to run efficiently on AWS. No ML code to write!
  • A collection of built-in open source frameworks (TensorFlow, PyTorch, Apache MXNet, scikit-learn, and more), where you simply bring your own code.
  • Your own code running in your own container: custom Python, R, C++, Java, and so on.
  • Algorithms and pretrained models from AWS Marketplace for ML (https://aws.amazon.com/marketplace/solutions/machine-learning).

In addition, Amazon SageMaker Autopilot uses AutoML to automatically build, train, and optimize models without the need to write a single line of ML code.

Amazon SageMaker also includes two major capabilities that help with building and preparing datasets:

  • Amazon SageMaker Ground Truth: Annotate datasets at any scale. Workflows for popular use cases are built in (image detection, entity extraction, and more), and you can implement your own. Annotation jobs can be distributed to workers that belong to private, third-party, or public workforces.
  • Amazon SageMaker Processing: Run data processing and model evaluation batch jobs, using either scikit-learn or Spark.

Training

As mentioned earlier, Amazon SageMaker takes care of provisioning and managing your training infrastructure. You'll never spend any time managing servers, and you'll be able to focus on ML. On top of this, SageMaker brings advanced capabilities such as the following:

  • Managed storage using either Amazon S3, Amazon EFS, or Amazon FSx for Lustre depending on your performance requirements.
  • Managed spot training, using Amazon EC2 Spot instances for training in order to reduce costs by up to 80%.
  • Distributed training automatically distributes large-scale training jobs on a cluster of managed instances
  • Pipe mode streams infinitely large datasets from Amazon S3 to the training instances, saving the need to copy data around.
  • Automatic model tuning runs hyperparameter optimization in order to deliver high-accuracy models more quickly.
  • Amazon SageMaker Experiments easily tracks, organizes, and compares all your SageMaker jobs.
  • Amazon SageMaker Debugger captures the internal model state during training, inspects it to observe how the model learns, and detects unwanted conditions that hurt accuracy.

Deploying

Just as with training, Amazon SageMaker takes care of all your deployment infrastructure, and brings a slew of additional features:

  • Real-time endpoints: This creates an HTTPS API that serves predictions from your model. As you would expect, autoscaling is available.
  • Batch transform: This uses a model to predict data in batch mode.
  • Infrastructure monitoring with Amazon CloudWatch: This helps you to view real-time metrics and keep track of infrastructure performance.
  • Amazon SageMaker Model Monitor: This captures data sent to an endpoint, and compares it with a baseline to identify and alert on data quality issues (missing features, data drift, and more).
  • Amazon SageMaker Neo: This compiles models for a specific hardware architecture, including embedded platforms, and deploys an optimized version using a lightweight runtime.
  • Amazon Elastic Inference: This adds fractional GPU acceleration to CPU-based instances in order to find the best cost/performance ratio for your prediction infrastructure.

The Amazon SageMaker API

Just like all other AWS services, Amazon SageMaker is driven by APIs that are implemented in the language SDKs supported by AWS (https://aws.amazon.com/tools/). In addition, a dedicated Python SDK, aka the 'SageMaker SDK,' is also available. Let's look at both, and discuss their respective benefits.

The AWS language SDKs

Language SDKs implement service-specific APIs for all AWS services: S3, EC2, and so on. Of course, they also include SageMaker APIs, which are documented at https://docs.aws.amazon.com/sagemaker/latest/dg/api-and-sdk-reference.html.

When it comes to data science and ML, Python is the most popular language, so let's take a look at the SageMaker APIs available in boto3, the AWS SDK for the Python language (https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html). These APIs are quite low level and verbose: for example, create_training_job() has a lot of JSON parameters that don't look very obvious. You can see some of them in the next screenshot. You may think that this doesn't look very appealing for everyday ML experimentation… and I would totally agree!

Figure 1.1 A partial view of the create_training_job() API in boto3

Figure 1.1 A partial view of the create_training_job() API in boto3

Indeed, these service-level APIs are not meant to be used for experimentation in notebooks. Their purpose is automation, through either bespoke scripts or Infrastructure-as-Code tools such as AWS CloudFormation (https://aws.amazon.com/cloudformation) and Terraform (https://terraform.io). Your DevOps team will use them to manage production, where they do need full control over each possible parameter.

So, what should you use for experimentation? You should use the Amazon SageMaker SDK.

The Amazon SageMaker SDK

The Amazon SageMaker SDK (https://github.com/aws/sagemaker-python-sdk) is a Python SDK specific to Amazon SageMaker. You can find its documentation at https://sagemaker.readthedocs.io/en/stable/.

Note:

The code examples in this book are based on the first release of the SageMaker SDK v2, released in August 2020. For the sake of completeness, and to help you migrate your own notebooks, the companion GitHub repository includes examples for SDK v1 and v2.

Here, the abstraction level is much higher: the SDK contains objects for models, estimators, models, predictors, and so on. We're definitely back into ML territory.

For instance, this SDK makes it extremely easy and comfortable to fire up a training job (one line of code) and to deploy a model (one line of code). Infrastructure concerns are abstracted away, and we can focus on ML instead. Here's an example. Don't worry about the details for now:

# Configure the training job my_estimator = TensorFlow(    'my_script.py',    role=my_sageMaker_role,    instance_type='ml.p3.2xlarge',    instance_count=1,    framework_version='2.1.0')
# Train the model my_estimator.fit('s3://my_bucket/my_training_data/')
# Deploy the model to an HTTPS endpoint my_predictor = my_estimator.deploy(    initial_instance_count=1,     instance_type='ml.c5.2xlarge')

Now that we know a little more about Amazon SageMaker, let's see how it helps typical customers make their ML workflows more agile and more efficient.

Demonstrating the strengths of Amazon SageMaker

Alice and Bob are both passionate, hardworking people who try their best to build great ML solutions. Unfortunately, a lot of things stand in their way and slow them down.

In this section, let's look at the challenges that they face in their daily projects, and how Amazon SageMaker could help them be more productive.

Solving Alice's problems

Alice has a PhD and works in a large public research lab. She's a trained data scientist, with a strong background in math and statistics. She spends her time on large scientific projects involving bulky datasets. Alice generally doesn't know much about IT and infrastructure, and she honestly doesn't care at all for these topics. Her focus is on advancing her research, and publishing papers.

For her daily work, she can rely on her own powerful (but expensive) desktop workstation. She enjoys the fact that she can work on her own, but she can only experiment with a fraction of her dataset if she wants to keep training times reasonable.

She tries to maintain the software configuration of her machine herself, as IT doesn't know much about the esoteric tools she uses. When something goes wrong, she wastes precious hours fixing it, and that's frustrating.

When Alice wants to run large experiments, she has to use remote servers hosted in the computing centre: a farm of very powerful multi-GPU servers, connected to a petabyte of network-attached storage. Of course, she has to share these servers with other researchers. Every week, the team leads meet and try to prioritize projects and workloads: this is never easy, and decisions often need to be escalated to the lab director.

Let's see how SageMaker and cloud computing can help Alice.

Launching an inexpensive SageMaker notebook instance in minutes, Alice could start running some sample notebooks, and she would quickly become familiar with the service, as it's based on the same tools she already uses. Scaling up, she then could train her own model on a cluster of powerful GPU instances, created on demand with just a couple of lines of code. That's more computing power than she would have ever managed using in the computing centre, and she wouldn't have to set up anything!

Thanks to the automatic model tuning feature in SageMaker, Alice would also be able to significantly improve the accuracy of her models in just a few hours of parallel optimization. Again, doing this with her previous setup would have been impossible due to the lack of computing resources. Deploying models would be equally straightforward: adapting a couple of lines of code found in a sample notebook, Alice would use the batch transform feature to predict her test dataset, again spending no time at all worrying about tools or infrastructure.

Last but not least, keeping track of her expenses would be easy: the AWS console would tell her how much she's spent, which would be less than expected thanks to the on-demand nature of SageMaker infrastructure!

Solving Bob's problems

Bob is a DevOps engineer, and he's in charge of a large training cluster shared by a team of data scientists. They can start their distributed jobs in seconds, and it's just simpler for Bob to manage a single cluster. Auto Scaling is set up, but capacity planning is still needed to find the right amount of EC2 instances and to optimize the cost using the right mix of Reserved, Spot, and On-Demand instances. Bob has a weekly meeting with the team to make sure they'll have enough instances… and they also ping him on Slack when they need extra capacity on the fly. Bob tries to automatically reduce capacity at night and on weekends when the cluster is less busy, but he's quite sure they're spending too much anyway. Oh, well.

Once models have been trained and validated, Bob uses Continuous Integration and Continuous Deployment (CI/CD) to deploy them automatically to the production Docker cluster. Bob maintains bespoke containers for training and prediction: libraries, dependencies, and in-house tools. That takes a bit of time, but he enjoys doing it. He just hopes that no one will ask him to do PyTorch and Apache MXNet too.

Let's see how Bob could use SageMaker to improve his ML workflows.

As SageMaker is based on Docker containers, Bob could get rid of his bespoke containers and use their built-in counterparts. Migrating the training workloads to SageMaker would be pretty easy. This would help Bob get rid of his training cluster, and let every data scientist train completely on demand instead. With Managed Spot Training, Bob could certainly optimize training costs even more.

The data science team would quickly adopt advanced features like distributed training, Pipe mode, and automatic model tuning. This would save them a lot of time, and the team would no longer have to maintain the kludgy code they have written to implement similar features.

Of course, Alice and Bob are fictional characters. Yet, I keep meeting many customers who share some (and sometimes all) of their pain points. That may be your case too, so please let me get you started with Amazon SageMaker.

Setting up Amazon SageMaker on your local machine

A common misconception is that you can't use SageMaker outside of the AWS cloud. Obviously, it is a cloud-based service, and its most appealing capabilities require cloud infrastructure to run. However, many developers like to set up their development environment their own way, and SageMaker lets them do that: in this section, you will learn how to install the SageMaker SDK on your local machine or on a local server. In later chapters, you'll learn how to train and deploy models locally.

It's good practice to isolate Python environments in order to avoid dependency hell. Let's see how we can achieve this using two popular projects: virtualenv (https://virtualenv.pypa.io) and Anaconda (https://www.anaconda.com/).

Installing the SageMaker SDK with virtualenv

If you've never worked with virtualenv before, please read this tutorial before proceeding – https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/:

  1. First, let's create a new environment named sagemaker, and activate it:
    $ mkdir workdir $ cd workdir $ python3 -m venv sagemaker $ source sagemaker/bin/activate
  2. Now, let's install boto3, the SageMaker SDK, and the pandas library (https://pandas.pydata.org/), which is also required:
    $ pip install boto3 sagemaker pandas
  3. Now, let's quickly check that we can import these SDKs in Python:
    $ python3
    Python 3.7.4 (default, Aug 13 2019, 15:17:50)>>> import boto3 >>> import sagemaker >>> print(boto3.__version__)1.12.39 >>> print(sagemaker.__version__)1.55.3 >>> exit

The installation looks fine. Your own versions will certainly be newer and that's fine. Now, let's run a quick test with a local Jupyter server (https://jupyter.org/). If Jupyter isn't installed on your machine, you can find instructions at https://jupyter.org/install:

  1. First, let's create a Jupyter kernel based on our virtual environment:
    $ pip install ipykernel $ python3 -m ipykernel install --user --name=sagemaker
  2. Then, we can launch Jupyter:
    $ jupyter notebook
  3. Creating a new notebook, we can see that the sagemaker kernel is available, so let's select it in the New menu, as seen in the following screenshot:
    Figure 1.2 Creating a new notebook

    Figure 1.2 Creating a new notebook

  4. Finally, we can check that the SDKs are available, by importing them and printing their version, as shown in the following screenshot:
Figure 1.3 Checking the SDK version

Figure 1.3 Checking the SDK version

This completes the installation with virtualenv. Don't forget to terminate Jupyter, and to deactivate your virtualenv:

$ deactivate

Installing the SageMaker SDK with Anaconda

Anaconda includes a package manager named conda that lets you create and manage isolated environments. If you've never worked with conda before, you should do the following first:

  • Install Anaconda: https://docs.anaconda.com/anaconda/install/
  • Read this tutorial: https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html

Perform the following steps to install the Sagemaker SDK with conda:

  1. Let's create and activate a new conda environment named conda-sagemaker:
    $ conda create -y -n conda-sagemaker $ conda activate conda-sagemaker
  2. Then, we install pandas, boto3, and the SageMaker SDK. The latter has to be installed with pip, as it's not available as a conda package:
    $ conda install -y boto3 pandas $ pip install sagemaker
  3. Now, let's add Jupyter and its dependencies to the environment, and create a new kernel:
    $ conda install -y jupyter ipykernel $ python3 -m ipykernel install --user --name conda-sagemaker
  4. Then, we can launch Jupyter, and check that the conda-sagemaker kernel is present in the New menu, as is visible in the next screenshot:
    $ jupyter notebook
    Figure 1.4 Creating a new conda environment

    Figure 1.4 Creating a new conda environment

  5. As shown in the following screenshot, we can create a notebook using this kernel, and check that the SDKs are imported correctly:
Figure 1.5 Checking the SDK version

Figure 1.5 Checking the SDK version

This completes the installation with conda. Whether you'd rather use it over virtualenv is largely a matter of personal preference. You can definitely run all notebooks in this book and build your own projects with one or the other.

A word about AWS permissions

Amazon Identity and Access Management (IAM) enables you to manage access to AWS services and resources securely (https://aws.amazon.com/iam). Of course, this applies to Amazon SageMaker as well, and you need to make sure that your AWS user has sufficient permissions to invoke the SageMaker API.

Note:

If you're not familiar with IAM at all, please read the following documentation: https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html

You can run a quick test by using the AWS CLI on one of the SageMaker APIs, for example, list-endpoints. I'm using the eu-west-1 region here, but feel free to use the region that is nearest to you:

$ aws sagemaker list-endpoints --region eu-west-1 {    "Endpoints": []}

If you get an error message complaining about insufficient permissions, you need to update the IAM role attached to your AWS user.

If you own the AWS account in question, you can easily do this yourself in the IAM console, by adding the AmazonSageMakerFullAccess managed policy to your role. Note that this policy is extremely permissive: this is fine for a development account, but certainly not for a production account.

If you work with an account where you don't have administrative rights (such as a company-provided account), please contact your IT administrator to add SageMaker permissions to your AWS account.

For more information on SageMaker permissions, please refer to the documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/security-iam.html.

Setting up an Amazon SageMaker notebook instance

Experimentation is a key part of the ML process. Developers and data scientists use a collection of open source tools and libraries for data exploration and processing, and of course, to evaluate candidate algorithms. Installing and maintaining these tools takes a fair amount of time, which would probably be better spent on studying the ML problem itself!

In order to solve this problem, Amazon SageMaker makes it easy to fire up a notebook instance in minutes. A notebook instance is a fully managed Amazon EC2 instance that comes preinstalled with the most popular tools and libraries: Jupyter, Anaconda (and its conda package manager), numpy, pandas, deep learning frameworks, and even NVIDIA GPU drivers.

Note:

If you're not familiar with S3 at all, please read the following documentation:https://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html

Let's create one such instance using the AWS Console (https://console.aws.amazon.com/sagemaker/):

  1. In the Notebook section of the left-hand vertical menu, click on Notebook instances, as shown in the next screenshot:
    Figure 1.6 Creating a notebook instance

    Figure 1.6 Creating a notebook instance

    Note:

    The AWS console is a living thing. By the time you're reading this, some screens may have been updated. Also, you may notice small differences from one region to the next, as some features or instance types are not available there.

  2. Then, click on Create notebook instance. In the Notebook instance settings box, we need to enter a name, and select an instance type: as you can see in the drop-down list, SageMaker lets us pick from a very wide range of instance types. As you would expect, pricing varies according to the instance size, so please make sure you familiarize yourself with instance features and costs (https://aws.amazon.com/sagemaker/pricing/).
  3. We'll stick to ml.t2.medium for now. As a matter of fact, it's an excellent default choice if your notebooks only invoke SageMaker APIs that create fully managed infrastructure for training and deployment – no need for anything larger. If your workflow requires local data processing and model training, then feel free to scale up as needed.

    We can ignore Elastic Inference for now, it will be covered in Chapter 13, Optimizing Prediction Cost and Performance. Thus, your setup screen should look like the following screenshot:

    Figure 1.7 Creating a notebook instance

    Figure 1.7 Creating a notebook instance

  4. As you can see in the following screenshot, we could optionally apply a lifecycle configuration, a script that runs either when a notebook instance is created or restarted, in order to install additional libraries, clone repositories, and so on. We could also add additional storage (the default is set to 5 GB):
    Figure 1.8 Creating a notebook instance

    Figure 1.8 Creating a notebook instance

  5. In the Permissions and encryption section, we need to create an Amazon IAM role for the notebook instance: it will allow it to access storage in Amazon S3, to create Amazon SageMaker infrastructure, and so on.

    Select Create a new role, which opens the following screen:

    Figure 1.9 Creating an IAM role

    Figure 1.9 Creating an IAM role

    The only decision we have to make here is whether we want to allow our notebook instance to access specific Amazon S3 buckets. Let's select Any S3 bucket and click on Create role. This is the most flexible setting for development and testing, but we'd want to apply much stricter settings for production. Of course, we can edit this role later on in the IAM console, or create a new one.

    Optionally, we can disable root access to the notebook instance, which helps lock down its configuration. We can also enable storage encryption using Amazon Key Management Service (https://aws.amazon.com/kms). Both features are extremely important in high-security environments, but we won't enable them here.

    Once you've completed this step, your screen should look like this, although the name of the role will be different:

    Figure 1.10 Creating an IAM role

    Figure 1.10 Creating an IAM role

  6. As shown in the following screenshot, the optional Network section lets you pick the Amazon Virtual Private Cloud (VPC) where the instance will be launched. This is useful when you need tight control over network flows from and to the instance, for example, to deny it access to the internet. Let's not use this feature here:
    Figure 1.11 Setting the VPC

    Figure 1.11 Setting the VPC

  7. The optional Git repositories section lets you add one or more Git repositories that will be automatically cloned on the notebook instance when it's first created. You can select any public Git repository, or select one from a list of repositories that you previously defined in Amazon SageMaker: the latter can be done under Git repositories in the Notebook section of the left-hand vertical menu.

    Let's clone one of my repositories to illustrate, and enter its name as seen in the following screenshot. Feel free to use your own!

    Figure 1.12 Setting Git repositories

    Figure 1.12 Setting Git repositories

  8. Last but not least, the optional Tags section lets us tag notebook instances. It's always good practice to tag AWS resources, as this makes it much easier to manage them later on. Let's add a couple of tags.
  9. As shown in the following screenshot, let's click on Create notebook instance:
    Figure 1.13 Setting tags

    Figure 1.13 Setting tags

    Under the hood, SageMaker fires up a fully managed Amazon EC2 instance, using an Amazon Machine Image (AMI) preinstalled with Jupyter, Anaconda, deep learning libraries, and so on. Don't look for it in the EC2 console, you won't see it.

  10. Five to ten minutes later, the instance is in service, as shown in the following screenshot. Let's click on Open JupyterLab:
Figure 1.14 Opening a notebook instance

Figure 1.14 Opening a notebook instance

We'll jump straight into Jupyter Lab. As shown in the following screenshot, we see in the left-hand panel that the repository has been cloned. In the Launcher panel, we see the many conda environments that are readily available for TensorFlow, PyTorch, Apache MXNet, and more:

Figure 1.15 Notebook instance welcome screen

Figure 1.15 Notebook instance welcome screen

The rest is vanilla Jupyter, and you can get to work right away!

Coming back to the AWS console, we see that we can stop, start, and delete a notebook instance, as shown in the next screenshot:

Figure 1.16 Stopping a notebook instance

Figure 1.16 Stopping a notebook instance

Stopping a notebook instance is identical to stopping an Amazon EC2 instance: storage is persisted until the instance is started again.

When a notebook instance is stopped, you can then delete it: the storage will be destroyed, and you won't be charged for anything any longer.

If you're going to use this instance to run the examples in this book, I'd recommend stopping it and restarting it. This will save you the trouble of recreating it again and again, your work will be preserved, and the costs will really be minimal.

Setting up Amazon SageMaker Studio

Amazon SageMaker Studio goes one step further in integrating the ML tools you need from experimentation to production. At its core is an integrated development environment based on Jupyter that makes it instantly familiar.

In addition, SageMaker Studio is integrated with other SageMaker capabilities, such as SageMaker Experiments to track and compare all jobs, SageMaker Autopilot to automatically create ML models, and more. A lot of operations can be achieved in just a few clicks, without having to write any code.

SageMaker Studio also further simplifies infrastructure management. You won't have to create notebook instances: SageMaker Studio provides you with compute environments that are readily available to run your notebooks.

Note:

This section requires basic knowledge of Amazon VPC and Amazon IAM. If you're not familiar with them at all, please read the following documentation:a) https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html b) https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html

Onboarding to Amazon SageMaker Studio

You can access SageMaker Studio using any of these three options:

  1. Use the quick start procedure: This is the easiest option for individual accounts, and we'll walk through it in the following paragraphs.
  2. Use AWS Single Sign-On (SSO): If your company has an SSO application set up, this is probably the best option. You can learn more about SSO onboarding at https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-sso-users.html. Please contact your IT administrator for details.
  3. Use Amazon IAM: If your company doesn't use SSO, this is probably the best option. You can learn more about SSO onboarding at https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-iam.html. Again, please contact your IT administrator for details.

Onboarding with the quick start procedure

Perform the following steps to access the SageMaker Studio with the quick start procedure:

  1. First, open the AWS Console in one of the regions where Amazon SageMaker Studio is available, for example, https://us-east-2.console.aws.amazon.com/sagemaker/.
  2. As shown in the following screenshot, the left-hand vertical panel has a link to SageMaker Studio:
    Figure 1.17 Opening SageMaker Studio

    Figure 1.17 Opening SageMaker Studio

  3. Clicking on this link opens the onboarding screen, and you can see its first section in the next screenshot:
    Figure 1.18 Running Quick start

    Figure 1.18 Running Quick start

  4. Let's select Quick start. Then, we enter the username we'd like to use to log into SageMaker Studio, and we create a new IAM role as shown in the preceding screenshot. This opens the following screen:
    Figure 1.19 Creating an IAM role

    Figure 1.19 Creating an IAM role

    The only decision we have to make here is whether we want to allow our notebook instance to access specific Amazon S3 buckets. Let's select Any S3 bucket and click on Create role. This is the most flexible setting for development and testing, but we'd want to apply much stricter settings for production. Of course, we can edit this role later on in the IAM console, or create a new one.

  5. Once we've clicked on Create role, we're back to the previous screen, where we just have to click on Submit to launch the onboarding procedure. Depending on your account setup, you may get an extra screen asking you to select a VPC and a subnet. I'd recommend selecting any subnet in your default VPC.
  6. A few minutes later, SageMaker Studio is in service, as shown in the following screenshot. We could add extra users if we needed to, but for now, let's just click on Open Studio:
    Figure 1.20 Launching SageMaker Studio

    Figure 1.20 Launching SageMaker Studio

    Don't worry if this takes a few more minutes, as SageMaker Studio needs to complete the first-run setup of your environment. As shown in the following screenshot, SageMaker Studio opens, and we see the familiar JupyterLab layout:

    Note:

    SageMaker Studio is a living thing. By the time you're reading this, some screens may have been updated. Also, you may notice small differences from one region to the next, as some features or instance types are not available everywhere.

    Figure 1.21 SageMaker Studio welcome screen

    Figure 1.21 SageMaker Studio welcome screen

  7. We can immediately create our first notebook. In the Launcher tab, let's select Data Science, and click on NotebookPython 3.
  8. This opens a notebook, as is visible in the following screenshot. We first check that SDKs are readily available. If this is the first time you've launched the Data Science image, please wait for a couple of minutes for the environment to start:
    Figure 1.22 Checking the SDK version

    Figure 1.22 Checking the SDK version

  9. When we're done working with SageMaker Studio, all we have to do is close the browser tab. If we want to resume working, we just have to go back to the SageMaker console, and click on Open Studio.

Now that we've completed this exercise, let's review what we learned in this chapter.

Summary

In this chapter, you discovered the main capabilities of Amazon SageMaker, and how they help solve your ML pain points. By providing you with managed infrastructure and pre-installed tools, SageMaker lets you focus on the ML problem itself. Thus, you can go more quickly from experimenting with models to deploying them in production.

Then, you learned how to set up Amazon SageMaker on your local machine, on a notebook instance, and on Amazon SageMaker Studio. The latter is a managed ML IDE where many other SageMaker capabilities are just a few clicks away.

In the next chapter, we'll see how you can use Amazon SageMaker and other AWS services to prepare your datasets for training.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Build, train, and deploy machine learning models quickly using Amazon SageMaker
  • Analyze, detect, and receive alerts relating to various business problems using machine learning algorithms and techniques
  • Improve productivity by training and fine-tuning machine learning models in production

Description

Amazon SageMaker enables you to quickly build, train, and deploy machine learning (ML) models at scale, without managing any infrastructure. It helps you focus on the ML problem at hand and deploy high-quality models by removing the heavy lifting typically involved in each step of the ML process. This book is a comprehensive guide for data scientists and ML developers who want to learn the ins and outs of Amazon SageMaker. You’ll understand how to use various modules of SageMaker as a single toolset to solve the challenges faced in ML. As you progress, you’ll cover features such as AutoML, built-in algorithms and frameworks, and the option for writing your own code and algorithms to build ML models. Later, the book will show you how to integrate Amazon SageMaker with popular deep learning libraries such as TensorFlow and PyTorch to increase the capabilities of existing models. You’ll also learn to get the models to production faster with minimum effort and at a lower cost. Finally, you’ll explore how to use Amazon SageMaker Debugger to analyze, detect, and highlight problems to understand the current model state and improve model accuracy. By the end of this Amazon book, you’ll be able to use Amazon SageMaker on the full spectrum of ML workflows, from experimentation, training, and monitoring to scaling, deployment, and automation.

Who is this book for?

This book is for software engineers, machine learning developers, data scientists, and AWS users who are new to using Amazon SageMaker and want to build high-quality machine learning models without worrying about infrastructure. Knowledge of AWS basics is required to grasp the concepts covered in this book more effectively. Some understanding of machine learning concepts and the Python programming language will also be beneficial.

What you will learn

  • Create and automate end-to-end machine learning workflows on Amazon Web Services (AWS)
  • Become well-versed with data annotation and preparation techniques
  • Use AutoML features to build and train machine learning models with AutoPilot
  • Create models using built-in algorithms and frameworks and your own code
  • Train computer vision and NLP models using real-world examples
  • Cover training techniques for scaling, model optimization, model debugging, and cost optimization
  • Automate deployment tasks in a variety of configurations using SDK and several automation tools

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Aug 27, 2020
Length: 490 pages
Edition : 1st
Language : English
ISBN-13 : 9781800208919
Category :
Languages :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Aug 27, 2020
Length: 490 pages
Edition : 1st
Language : English
ISBN-13 : 9781800208919
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 107.97
Learn Amazon SageMaker
€36.99
Hands-On Mathematics for Deep Learning
€32.99
40 Algorithms Every Programmer Should Know
€37.99
Total 107.97 Stars icon
Banner background image

Table of Contents

18 Chapters
Section 1: Introduction to Amazon SageMaker Chevron down icon Chevron up icon
Chapter 1: Introduction to Amazon SageMaker Chevron down icon Chevron up icon
Chapter 2: Handling Data Preparation Techniques Chevron down icon Chevron up icon
Section 2: Building and Training Models Chevron down icon Chevron up icon
Chapter 3: AutoML with Amazon SageMaker Autopilot Chevron down icon Chevron up icon
Chapter 4: Training Machine Learning Models Chevron down icon Chevron up icon
Chapter 5: Training Computer Vision Models Chevron down icon Chevron up icon
Chapter 6: Training Natural Language Processing Models Chevron down icon Chevron up icon
Chapter 7: Extending Machine Learning Services Using Built-In Frameworks Chevron down icon Chevron up icon
Chapter 8: Using Your Algorithms and Code Chevron down icon Chevron up icon
Section 3: Diving Deeper on Training Chevron down icon Chevron up icon
Chapter 9: Scaling Your Training Jobs Chevron down icon Chevron up icon
Chapter 10: Advanced Training Techniques Chevron down icon Chevron up icon
Section 4: Managing Models in Production Chevron down icon Chevron up icon
Chapter 11: Deploying Machine Learning Models Chevron down icon Chevron up icon
Chapter 12: Automating Machine Learning Workflows Chevron down icon Chevron up icon
Chapter 13: Optimizing Prediction Cost and Performance Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.3
(10 Ratings)
5 star 80%
4 star 0%
3 star 0%
2 star 10%
1 star 10%
Filter icon Filter
Top Reviews

Filter reviews by




Julio M. Faerman Dec 07, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book helped me a lot in understanding how Machine Learning works at AWS and passing the certification exam. It is written in a way that any technical professional can understand and build, with clear examples and language. Also check the author's blog and resources, great content.
Amazon Verified review Amazon
Polgár Oct 15, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Si usas Amazon SageMaker, debes saber quién es Julien SImon.Por lo tanto..... Debes de comprar el libro, tú sabes a lo que me refiero.Si no sabes quién es es el Autor y quieres usar SageMaker.¡Con más razón debes de comprar el libro!
Amazon Verified review Amazon
Gokulaa Dec 30, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Great book to start your ML journey on AWS
Amazon Verified review Amazon
Ricardo S. Nov 23, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I got this book a few months ago and have been working my way through the chapters. The steady progression of each chapter has really helped me understand what SageMaker is and also other aspects of machine learning too. The different options with regards to selection of algorithms or the processes need to prep/clean data as well as the focus on making sure you think about the cost of different approaches means that I feel much more confident about the capabilities of SageMaker and how to get the most of it. Thank you Julien!
Amazon Verified review Amazon
Danilo Poccia Sep 25, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
An broad, deep overview of the many ways Amazon SageMaker can you with your machine learning experiments and applications. Covering all the stages, from labeling and preprocessing your datasets, to training at scale, and deploying models in production. Rich of experiments in areas such as computer vision and NLP. A definite guide.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.