You're reading from Machine Learning with Amazon SageMaker Cookbook 80 proven recipes for data scientists and developers to perform machine learning experiments and deployments

Product type Paperback

Published in Oct 2021

Publisher Packt

ISBN-13 9781800567030

Length 762 pages

Edition 1st Edition

Languages

Python

Tools

AWS

Concepts

Machine Learning

Author (1):

Joshua Arvin Lat

View More author details

Table of Contents (11) Chapters

Preface

1. Chapter 1: Getting Started with Machine Learning Using Amazon SageMaker

2. Chapter 2: Building and Using Your Own Algorithm Container Image FREE CHAPTER

3. Chapter 3: Using Machine Learning and Deep Learning Frameworks with Amazon SageMaker

4. Chapter 4: Preparing, Processing, and Analyzing the Data

5. Chapter 5: Effectively Managing Machine Learning Experiments

6. Chapter 6: Automated Machine Learning in Amazon SageMaker

7. Chapter 7: Working with SageMaker Feature Store, SageMaker Clarify, and SageMaker Model Monitor

8. Chapter 8: Solving NLP, Image Classification, and Time-Series Forecasting Problems with Built-in Algorithms

9. Chapter 9: Managing Machine Learning Workflows and Deployments

10. Other Books You May Enjoy

Launching and preparing the Cloud9 environment

In this recipe, we will launch and configure an AWS Cloud9 instance running an Ubuntu server. This will serve as the experimentation and simulation environment for the other recipes in this chapter. After that, we will resize the volume attached to the instance so that we can build container images later. This will ensure that we don't have to worry about disk space issues while we are working with Docker containers and container images. In the succeeding recipes, we will be preparing the expected file and directory structure that our train and serve scripts will expect when they are inside the custom container.

Important note

Why go through all this effort of preparing an experimentation environment? Once we have finished preparing the experimentation environment, we will be able to prepare, test, and update the custom scripts quickly, without having to use the fit() and deploy() functions from the SageMaker Python SDK during the initial stages of writing the script. With this approach, the feedback loop is much faster, and we will detect the issues in our script and container image before we even attempt using these with the SageMaker Python SDK during training and deployment.

Getting ready

Make sure you have permission to manage the AWS Cloud9 and EC2 resources if you're using an AWS IAM user with a custom URL. It is recommended to be signed in as an AWS IAM user instead of using the root account in most cases.

How to do it…

The steps in this recipe can be divided into three parts:

Launching a Cloud9 environment
Increasing the disk space of the environment
Making sure that the volume configuration changes get reflected by rebooting the instance associated with the Cloud9 environment

We'll begin by launching the Cloud9 environment with the help of the following steps:

Click Services on the navigation bar. A list of services will be shown in the menu. Under Developer Tools, look for Cloud9 and then click the link to navigate to the Cloud9 console:
Figure 2.2 – Looking for the AWS Cloud9 service under Developer Tools
In the preceding screenshot, we can see the services after clicking the Services link on the navigation bar.
In the Cloud9 console, navigate to Your environments using the sidebar and click Create environment:
Figure 2.3 – Create environment button
Here, we can see that the Create environment button is located near the top-right corner of the page.
Specify the environment's name (for example, Cookbook Experimentation Environment) and, optionally, a description for your environment. Click Next step afterward:
Figure 2.4 – Name environment form
Here, we have the Name environment form, where we can specify the name and description of our Cloud9 environment.
Select the Create a new EC2 instance for environment (direct access) option under Environment type, t3.small under Instance type, and Ubuntu Server 18.04 LTS under Platform:
Figure 2.5 – Environment settings
We can see the different configuration settings here. Feel free to choose a different instance type as needed.
Under Cost-saving setting, select After one hour. Leave the other settings as-is and click Next step:
Figure 2.6 – Other configuration settings
Here, we can see that we have selected a Cost-saving setting of After one hour. This means that after an hour of inactivity, the EC2 instance linked to the Cloud9 environment will be automatically turned off to save costs.
Review the configuration you selected in the previous steps and then click Create environment:
Figure 2.7 – Create environment button
After clicking the Create environment button, it may take a minute or so for the environment to be ready. Once the environment is ready, check the different sections of the IDE:
Figure 2.8 – AWS Cloud9 development environment
As you can see, we have the file tree on the left-hand side. At the bottom part of the screen, we have the Terminal, where we can run our Bash commands. The largest portion, at the center of the screen, is the Editor, where we can edit the files.
Now, we need to increase the disk space.
Using the Terminal at the bottom section of the IDE, run the following command:
```
lsblk
```
With the lsblk command, we will get information about the available block devices, as shown in the following screenshot:
Figure 2.9 – Result of the lsblk command
Here, we can see the results of the lsblk command. At this point, the root volume only has 10G of disk space (minus what is already in the volume).
At the top left section of the screen, click AWS Cloud9. From the dropdown list, click Go To Your Dashboard:
Figure 2.10 – How to go back to the AWS Cloud9 dashboard
This will open a new tab showing the Cloud9 dashboard.
Navigate to the EC2 console using the search bar. Type ec2 in the search bar and click the EC2 service from the list of results:
Figure 2.11 – Using the search bar to navigate to the EC2 console
Here, we can see that the search bar quickly gives us a list of search results after we have typed in ec2.
In the EC2 console, click Instances (running) under Resources:
Figure 2.12 – Instances (running) link under Resources
We should see the link we need to click under the Resources pane, as shown in the preceding screenshot.
Select the EC2 instance corresponding to the Cloud9 environment we launched in the previous set of steps. It should contain aws-cloud9 and the name we specified while creating the environment. In the bottom pane showing the details, click the Storage tab to show Root device details and Block devices.
Inside the Storage tab, scroll down to the bottom of the page to locate the volumes under Block devices:
Figure 2.13 – Storage tab
Here, we can see the Storage tab showing Root device details and Block devices.
You should see an attached volume with 10 GiB for the volume size. Click the link under Volume ID (for example, vol-0130f00a6cf349ab37). Take note that this Volume ID will be different for your volume:
Figure 2.14 – Looking for the volume attached to the EC2 instance
You will be redirected to the Elastic Block Store Volumes page, which shows the details of the volume attached to your instance:
Figure 2.15 – Elastic Block Store Volumes page
Here, we can see that the size of the volume is currently set to 10 GiB.
Click Actions and then Modify Volume:
Figure 2.16 – Modify Volume
This is where we can find the Modify Volume option.
Set Size to 100 and click Modify:
Figure 2.17 – Modifying the volume
As you can see, we specified a new volume size of 100 GiB. This should be more than enough to help us get through this chapter and build our custom algorithm container image.
Click Yes to confirm the volume modification action:
Figure 2.18 – Modify Volume confirmation dialog
We should see a confirmation screen here after clicking Modify in the previous step.
Click Close upon seeing the confirmation dialog:
Figure 2.19 – Modify Volume Request Succeeded message
Here, we can see a message stating Modify Volume Request Succeeded. At this point, the volume modification is still pending and we need to wait about 10-15 minutes for this to complete. Feel free to check out the How it works… section for this recipe while waiting.
Click the refresh button (the two rotating arrows) so that the volume state will change to the correct state accordingly:
Figure 2.20 – Refresh button
Clicking the refresh button will update State from in-use (green) to in-use – optimizing (yellow):
Figure 2.21 – In-use state – optimizing (yellow)
Here, we can see that the volume modification step has not been completed yet.
After a few minutes, State of the volume will go back to in-use (green):
Figure 2.22 – In-use state (green)
When we see what is shown in the preceding screenshot, we should celebrate as this means that the volume modification step has been completed!
Now that the volume modification step has been completed, our next goal is to make sure that this change is reflected in our environment.
Navigate back to the browser tab of the AWS Cloud9 IDE. In the Terminal, run lsblk:
```
lsblk
```
Running lsblk should yield the following output:
Figure 2.23 – Partition not yet reflecting the size of the root volume
As you can see, while the size of the root volume, /dev/nvme0n1, reflects the new size, 100G, the size of the /dev/nvme0n1p1 partition reflects the original size, 10G.
There are multiple ways to grow the partition, but we will proceed by simply rebooting the EC2 instance so that the size of the /dev/nvme0n1p1 partition will reflect the size of the root volume, which is 100G.
Navigate back to the EC2 Volumes page and select the EC2 volume attached to the Cloud9 instance. At the bottom portion of the screen showing the volume's details, locate the Attachment information value under the Description tab. Click the Attachment information link:
Figure 2.24 – Attachment information
Clicking this link will redirect us to the EC2 Instances page. It will automatically select the EC2 instance of our Cloud9 environment:
Figure 2.25 – EC2 instance of the Cloud9 environment
The preceding screenshot shows the EC2 instance linked to our Cloud9 environment.
Click Instance state at the top right of the screen and click Reboot instance:
Figure 2.26 – Reboot instance
This is where we can find the Reboot instance option.
Navigate back to the browser tab showing the AWS Cloud9 environment IDE. It should take a minute or two to complete the reboot step:
Figure 2.27 – Instance is still rebooting
We should see a screen similar to the preceding one.
Once connected, run lsblk in the Terminal:
```
lsblk
```
We should get a set of results similar to what is shown in the following screenshot:

Figure 2.28 – Partition now reflecting the size of the root instance

As we can see, the /dev/nvme0n1p1 partition now reflects the size of the root volume, which is 100G.

That was a lot of setup work, but this will be definitely worth it, as you will see in the next few recipes in this chapter. Now, let's see how this works!

How it works…

In this recipe, we launched a Cloud9 environment where we will prepare the custom container image. When building Docker container images, it is important to note that each container image consumes a bit of disk space. This is why we had to go through a couple of steps to increase the volume attached to the EC2 instance of our Cloud9 environment. This recipe was composed of three parts: launching a new Cloud9 environment, modifying the mounted volume, and rebooting the instance.

Launching a new Cloud9 environment involves using a CloudFormation template behind the scenes. This CloudFormation template is used as the blueprint when creating the EC2 instance:

Figure 2.29 – CloudFormation stack

Here, we have a CloudFormation stack that was successfully created. What's CloudFormation? AWS CloudFormation is a service that helps developers and DevOps professionals manage resources using templates written in JSON or YAML. These templates get converted into AWS resources using the CloudFormation service.

At this point, the EC2 instance should be running already and we can use the Cloud9 environment as well:

Figure 2.30 – AWS Cloud9 environment

We should be able to see the preceding output once the Cloud9 environment is ready. If we were to use the environment right away, we would run into disk space issues as we will be working with Docker images, which take up a bit of space. To prevent these issues from happening later on, we modified the volume in this recipe and restarted the EC2 instance so that this volume modification gets reflected right away.

Important note

In this recipe, we took a shortcut and simply restarted the EC2 instance. If we were running a production environment, we should avoid having to reboot and follow this guide instead: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/recognize-expanded-volume-linux.html.

Note that we can also use a SageMaker Notebook instance that's been configured with root access enabled as a potential experimentation environment for our custom scripts and container images, before using them in SageMaker. The issue here is that when using a SageMaker Notebook instance, it reverts to how it was originally configured every time we turn off and reboot the instance. This makes us lose certain directories and installed packages, which is not ideal.

You're reading from Machine Learning with Amazon SageMaker Cookbook 80 proven recipes for data scientists and developers to perform machine learning experiments and deployments

Table of Contents (11) Chapters

Launching and preparing the Cloud9 environment

Getting ready

How to do it…

How it works…

Authors (1)

Personalised recommendations for you