You're reading from Learn Amazon SageMaker A guide to building, training, and deploying machine learning models for developers and data scientists

Product type Paperback

Published in Aug 2020

Publisher Packt

ISBN-13 9781800208919

Length 490 pages

Edition 1st Edition

Languages

Python

Tools

AWS

Concepts

Machine Learning

Author (1):

Julien Simon

View More author details

Table of Contents (19) Chapters

Preface

1. Section 1: Introduction to Amazon SageMaker

2. Chapter 1: Introduction to Amazon SageMaker FREE CHAPTER

3. Chapter 2: Handling Data Preparation Techniques

4. Section 2: Building and Training Models

5. Chapter 3: AutoML with Amazon SageMaker Autopilot

6. Chapter 4: Training Machine Learning Models

7. Chapter 5: Training Computer Vision Models

8. Chapter 6: Training Natural Language Processing Models

9. Chapter 7: Extending Machine Learning Services Using Built-In Frameworks

10. Chapter 8: Using Your Algorithms and Code

11. Section 3: Diving Deeper on Training

12. Chapter 9: Scaling Your Training Jobs

13. Chapter 10: Advanced Training Techniques

14. Section 4: Managing Models in Production

15. Chapter 11: Deploying Machine Learning Models

16. Chapter 12: Automating Machine Learning Workflows

17. Chapter 13: Optimizing Prediction Cost and Performance

18. Other Books You May Enjoy

Leave a review - let other readers know what you think

Diving deep on SageMaker Autopilot

In this section, we're going to learn in detail how SageMaker Autopilot processes data and trains models. If this feels too advanced for now, you're welcome to skip this material. You can always revisit it later once you've gained more experience with the service.

First, let's look at the artifacts that SageMaker Autopilot produces.

The job artifacts

Listing our S3 bucket confirms the existence of many different artifacts:

$ aws s3 ls s3://sagemaker-us-east-2-123456789012/sagemaker/DEMO-autopilot/output/my-first-autopilot-job/

We can see many new prefixes. Let's figure out what's what:

PRE data-processor-models/PRE preprocessed-data/PRE sagemaker-automl-candidates/PRE transformed-data/PRE tuning/

The preprocessed-data/tuning_data prefix contains the training and validation splits generated from the input dataset. Each split is further broken into small CSV chunks: