Setting up Amazon SageMaker Studio
Experimentation is a key part of the Machine learning process. Developers and data scientists use a collection of open source tools and libraries for data exploration, data processing, and, of course, to evaluate candidate algorithms. Installing and maintaining these tools takes a fair amount of time, which would probably be better spent on studying the Machine learning problem itself!
Amazon SageMaker Studio brings you the machine learning tools you need from experimentation to production. At its core is an integrated development environment based on Jupyter that makes it instantly familiar.
In addition, SageMaker Studio is integrated with other SageMaker capabilities, such as SageMaker Experiments to track and compare all jobs, SageMaker Autopilot to automatically create machine learning models, and more. A lot of operations can be achieved in just a few clicks, without having to write any code.
SageMaker Studio also further simplifies infrastructure management. You won't have to create notebook instances: SageMaker Studio provides you with compute environments that are readily available to run your notebooks.
Note
This section requires basic knowledge of Amazon S3, Amazon VPC, and Amazon IAM. If you're not familiar with them at all, please read the following documentation:
https://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.htmachine learning
https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.htmachine learning
https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.htmachine learning
Now would also probably be a good time to take a look at (and bookmark) the SageMaker pricing page: https://aws.amazon.com/sagemaker/pricing/.
Onboarding to Amazon SageMaker Studio
You can access SageMaker Studio using any of these three options:
- Use the quick start procedure: This is the easiest option for individual accounts, and we'll walk through it in the following paragraphs.
- Use AWS Single Sign-On (SSO): If your company has an SSO application set up, this is probably the best option. You can learn more about SSO onboarding at https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-sso-users.htmachine learning. Please contact your IT administrator for details.
- Use Amazon IAM: If your company doesn't use SSO, this is probably the best option. You can learn more about SSO onboarding at https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-iam.htmachine learning. Again, please contact your IT administrator for details.
Onboarding with the quick start procedure
There are several steps to the quick start procedure:
- First, open the AWS Console in one of the regions where Amazon SageMaker Studio is available, for example, https://us-east-2.console.aws.amazon.com/sagemaker/.
- As shown in the following screenshot, the left-hand vertical panel has a link to SageMaker Studio:
- Clicking on this link opens the onboarding screen, and you can see its first section in the next screenshot:
- Let's select Quick start. Then, we enter the username we'd like to use to log in to SageMaker Studio, and we create a new IAM role as shown in the preceding screenshot. This opens the following screen:
The only decision we have to make here is whether we want to allow our notebook instance to access specific Amazon S3 buckets. Let's select Any S3 bucket and click on Create role. This is the most flexible setting for development and testing, but we'd want to apply much stricter settings for production. Of course, we can edit this role later on in the IAM console, or create a new one.
- Once we've clicked on Create role, we're back to the previous screen. Please make sure that project templates and JumpStart are enabled for this account. (this should be the default setting).
- We just have to click on Submit to launch the onboarding procedure. Depending on your account setup, you may get an extra screen asking you to select a VPC and a subnet. I'd recommend selecting any subnet in your default VPC.
- A few minutes later, SageMaker Studio is in service, as shown in the following screenshot. We could add extra users if we needed to, but for now, let's just click on Open Studio:
Don't worry if this takes a few more minutes, as SageMaker Studio needs to complete the first-run setup of your environment. As shown in the following screenshot, once we open SageMaker Studio, we see the familiar JupyterLab layout:
Note
SageMaker Studio is a living thing. By the time you're reading this, some screens may have been updated. Also, you may notice small differences from one region to the next, as some features or instance types are not available there.
- We can immediately create our first notebook. In the Launcher tab, in the Notebooks and compute resources section, let's select Data Science, and click on Notebook – Python 3.
- This opens a notebook, as is visible in the following screenshot. We first check that SDKs are readily available. As this is the first time we are launching the Data Science kernel, we need to wait for a couple of minutes.
- As is visible in the following screenshot, we can easily list resources that are currently running in our Studio instance: an machine learning.t3.medium instance, the data science image supporting the kernel used in our notebook, and the notebook itself:
- To avoid unnecessary costs, we should shut these resources down when we're done working with them. For example, we can shut down the instance and all resources running on it, as you can see in the following screenshot. Don't do it now, we'll need the instance to run the next examples!
- Machine learning.t3.medium is the default instance size that Studio uses. You can switch to other instance types by clicking on 2 vCPU + 4 GiB at the top of your notebook. This lets you select a new instance size and launch it in Studio. After a few minutes, the instance is up and your notebook code has been migrated automatically. Don't forget to shut down the previous instance, as explained earlier.
- When we're done working with SageMaker Studio, all we have to do is close the browser tab. If we want to resume working, we just have to go back to the SageMaker console and click on Open Studio.
- If we wanted to shut down the Studio instance itself, we'd simply select Shut Down in the File menu. All files would still be preserved until we deleted Studio completely in the SageMaker console.
Now that we've completed the setup, I'm sure you're impatient to get started with machine learning. Let's start deploying some models!