Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Learn Amazon SageMaker

You're reading from   Learn Amazon SageMaker A guide to building, training, and deploying machine learning models for developers and data scientists

Arrow left icon
Product type Paperback
Published in Aug 2020
Publisher Packt
ISBN-13 9781800208919
Length 490 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Julien Simon Julien Simon
Author Profile Icon Julien Simon
Julien Simon
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Preface 1. Section 1: Introduction to Amazon SageMaker
2. Chapter 1: Introduction to Amazon SageMaker FREE CHAPTER 3. Chapter 2: Handling Data Preparation Techniques 4. Section 2: Building and Training Models
5. Chapter 3: AutoML with Amazon SageMaker Autopilot 6. Chapter 4: Training Machine Learning Models 7. Chapter 5: Training Computer Vision Models 8. Chapter 6: Training Natural Language Processing Models 9. Chapter 7: Extending Machine Learning Services Using Built-In Frameworks 10. Chapter 8: Using Your Algorithms and Code 11. Section 3: Diving Deeper on Training
12. Chapter 9: Scaling Your Training Jobs 13. Chapter 10: Advanced Training Techniques 14. Section 4: Managing Models in Production
15. Chapter 11: Deploying Machine Learning Models 16. Chapter 12: Automating Machine Learning Workflows 17. Chapter 13: Optimizing Prediction Cost and Performance 18. Other Books You May Enjoy

Setting up an Amazon SageMaker notebook instance

Experimentation is a key part of the ML process. Developers and data scientists use a collection of open source tools and libraries for data exploration and processing, and of course, to evaluate candidate algorithms. Installing and maintaining these tools takes a fair amount of time, which would probably be better spent on studying the ML problem itself!

In order to solve this problem, Amazon SageMaker makes it easy to fire up a notebook instance in minutes. A notebook instance is a fully managed Amazon EC2 instance that comes preinstalled with the most popular tools and libraries: Jupyter, Anaconda (and its conda package manager), numpy, pandas, deep learning frameworks, and even NVIDIA GPU drivers.

Note:

If you're not familiar with S3 at all, please read the following documentation:https://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html

Let's create one such instance using the AWS Console (https://console.aws.amazon.com/sagemaker/):

  1. In the Notebook section of the left-hand vertical menu, click on Notebook instances, as shown in the next screenshot:
    Figure 1.6 Creating a notebook instance

    Figure 1.6 Creating a notebook instance

    Note:

    The AWS console is a living thing. By the time you're reading this, some screens may have been updated. Also, you may notice small differences from one region to the next, as some features or instance types are not available there.

  2. Then, click on Create notebook instance. In the Notebook instance settings box, we need to enter a name, and select an instance type: as you can see in the drop-down list, SageMaker lets us pick from a very wide range of instance types. As you would expect, pricing varies according to the instance size, so please make sure you familiarize yourself with instance features and costs (https://aws.amazon.com/sagemaker/pricing/).
  3. We'll stick to ml.t2.medium for now. As a matter of fact, it's an excellent default choice if your notebooks only invoke SageMaker APIs that create fully managed infrastructure for training and deployment – no need for anything larger. If your workflow requires local data processing and model training, then feel free to scale up as needed.

    We can ignore Elastic Inference for now, it will be covered in Chapter 13, Optimizing Prediction Cost and Performance. Thus, your setup screen should look like the following screenshot:

    Figure 1.7 Creating a notebook instance

    Figure 1.7 Creating a notebook instance

  4. As you can see in the following screenshot, we could optionally apply a lifecycle configuration, a script that runs either when a notebook instance is created or restarted, in order to install additional libraries, clone repositories, and so on. We could also add additional storage (the default is set to 5 GB):
    Figure 1.8 Creating a notebook instance

    Figure 1.8 Creating a notebook instance

  5. In the Permissions and encryption section, we need to create an Amazon IAM role for the notebook instance: it will allow it to access storage in Amazon S3, to create Amazon SageMaker infrastructure, and so on.

    Select Create a new role, which opens the following screen:

    Figure 1.9 Creating an IAM role

    Figure 1.9 Creating an IAM role

    The only decision we have to make here is whether we want to allow our notebook instance to access specific Amazon S3 buckets. Let's select Any S3 bucket and click on Create role. This is the most flexible setting for development and testing, but we'd want to apply much stricter settings for production. Of course, we can edit this role later on in the IAM console, or create a new one.

    Optionally, we can disable root access to the notebook instance, which helps lock down its configuration. We can also enable storage encryption using Amazon Key Management Service (https://aws.amazon.com/kms). Both features are extremely important in high-security environments, but we won't enable them here.

    Once you've completed this step, your screen should look like this, although the name of the role will be different:

    Figure 1.10 Creating an IAM role

    Figure 1.10 Creating an IAM role

  6. As shown in the following screenshot, the optional Network section lets you pick the Amazon Virtual Private Cloud (VPC) where the instance will be launched. This is useful when you need tight control over network flows from and to the instance, for example, to deny it access to the internet. Let's not use this feature here:
    Figure 1.11 Setting the VPC

    Figure 1.11 Setting the VPC

  7. The optional Git repositories section lets you add one or more Git repositories that will be automatically cloned on the notebook instance when it's first created. You can select any public Git repository, or select one from a list of repositories that you previously defined in Amazon SageMaker: the latter can be done under Git repositories in the Notebook section of the left-hand vertical menu.

    Let's clone one of my repositories to illustrate, and enter its name as seen in the following screenshot. Feel free to use your own!

    Figure 1.12 Setting Git repositories

    Figure 1.12 Setting Git repositories

  8. Last but not least, the optional Tags section lets us tag notebook instances. It's always good practice to tag AWS resources, as this makes it much easier to manage them later on. Let's add a couple of tags.
  9. As shown in the following screenshot, let's click on Create notebook instance:
    Figure 1.13 Setting tags

    Figure 1.13 Setting tags

    Under the hood, SageMaker fires up a fully managed Amazon EC2 instance, using an Amazon Machine Image (AMI) preinstalled with Jupyter, Anaconda, deep learning libraries, and so on. Don't look for it in the EC2 console, you won't see it.

  10. Five to ten minutes later, the instance is in service, as shown in the following screenshot. Let's click on Open JupyterLab:
Figure 1.14 Opening a notebook instance

Figure 1.14 Opening a notebook instance

We'll jump straight into Jupyter Lab. As shown in the following screenshot, we see in the left-hand panel that the repository has been cloned. In the Launcher panel, we see the many conda environments that are readily available for TensorFlow, PyTorch, Apache MXNet, and more:

Figure 1.15 Notebook instance welcome screen

Figure 1.15 Notebook instance welcome screen

The rest is vanilla Jupyter, and you can get to work right away!

Coming back to the AWS console, we see that we can stop, start, and delete a notebook instance, as shown in the next screenshot:

Figure 1.16 Stopping a notebook instance

Figure 1.16 Stopping a notebook instance

Stopping a notebook instance is identical to stopping an Amazon EC2 instance: storage is persisted until the instance is started again.

When a notebook instance is stopped, you can then delete it: the storage will be destroyed, and you won't be charged for anything any longer.

If you're going to use this instance to run the examples in this book, I'd recommend stopping it and restarting it. This will save you the trouble of recreating it again and again, your work will be preserved, and the costs will really be minimal.

You have been reading a chapter from
Learn Amazon SageMaker
Published in: Aug 2020
Publisher: Packt
ISBN-13: 9781800208919
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image