You're reading from Learn Amazon SageMaker A guide to building, training, and deploying machine learning models for developers and data scientists

Product type Paperback

Published in Aug 2020

Publisher Packt

ISBN-13 9781800208919

Length 490 pages

Edition 1st Edition

Languages

Python

Tools

AWS

Concepts

Machine Learning

Author (1):

Julien Simon

View More author details

Table of Contents (19) Chapters

Preface

1. Section 1: Introduction to Amazon SageMaker

2. Chapter 1: Introduction to Amazon SageMaker FREE CHAPTER

3. Chapter 2: Handling Data Preparation Techniques

4. Section 2: Building and Training Models

5. Chapter 3: AutoML with Amazon SageMaker Autopilot

6. Chapter 4: Training Machine Learning Models

7. Chapter 5: Training Computer Vision Models

8. Chapter 6: Training Natural Language Processing Models

9. Chapter 7: Extending Machine Learning Services Using Built-In Frameworks

10. Chapter 8: Using Your Algorithms and Code

11. Section 3: Diving Deeper on Training

12. Chapter 9: Scaling Your Training Jobs

13. Chapter 10: Advanced Training Techniques

14. Section 4: Managing Models in Production

15. Chapter 11: Deploying Machine Learning Models

16. Chapter 12: Automating Machine Learning Workflows

17. Chapter 13: Optimizing Prediction Cost and Performance

18. Other Books You May Enjoy

Leave a review - let other readers know what you think

Distributing training jobs

Distributed training lets you scale training jobs by running them on a cluster of CPU or GPU instances. These may train either on the full dataset or on a fraction of it, depending on the distribution policy that we configure. FullyReplicated distributes the full dataset to each instance. ShardedByS3Key distributes an equal number of input files to each instance, which is where splitting your dataset into many files comes in handy.

Distributing training for built-in algorithms

Distributed training is available for almost all built-in algorithms. Semantic Segmentation and LDA are notable exceptions.

As built-in algorithms are implemented with Apache MXNet, training instances use its Key-Value Store to exchange results. It's set up automatically by SageMaker on one of the training instances. Curious minds can learn more at https://mxnet.apache.org/api/faq/distributed_training.