You're reading from Applied Machine Learning and High-Performance Computing on AWS Accelerate the development of machine learning applications following architectural best practices

Product type Paperback

Published in Dec 2022

Publisher Packt

ISBN-13 9781803237015

Length 382 pages

Edition 1st Edition

Tools

AWS

Concepts

Machine Learning

Authors (4):

Trenton Potgieter

Shreyas Subramanian

Farooq Sabir

Mani Khanuja

View More author details

Table of Contents (20) Chapters

Preface

1. Part 1: Introducing High-Performance Computing

2. Chapter 1: High-Performance Computing Fundamentals FREE CHAPTER

3. Chapter 2: Data Management and Transfer

4. Chapter 3: Compute and Networking

5. Chapter 4: Data Storage

6. Part 2: Applied Modeling

7. Chapter 5: Data Analysis

8. Chapter 6: Distributed Training of Machine Learning Models

9. Chapter 7: Deploying Machine Learning Models at Scale

10. Chapter 8: Optimizing and Managing Machine Learning Models for Edge Deployment

11. Chapter 9: Performance Optimization for Real-Time Inference

12. Chapter 10: Data Visualization

13. Part 3: Driving Innovation Across Industries

14. Chapter 11: Computational Fluid Dynamics

15. Chapter 12: Genomics

16. Chapter 13: Autonomous Vehicles

17. Chapter 14: Numerical Optimization

18. Index

Why subscribe?

19. Other Books You May Enjoy

Executing a distributed training workload on AWS

Now that we’ve been introduced to some of the fundamentals of distributed training and what happens behind the scenes when we leverage SageMaker to launch a distributed Training job, let’s explore how we can execute such a workload on AWS. Since we’ve reviewed two placement techniques, namely data parallel and model parallel, we will start by reviewing how to execute distributed data parallel training. After which, we will then review how to execute distributed model parallel training, but also include the hybrid methodology and include an independent data parallel placement strategy alongside the model parallel example.

Note

In this example, we leverage a Vision Transformer (ViT) model to address an image classification use case. Since the objective of this section is to showcase how to practically implement both the data parallel and model parallel placement strategies, we will not be diving into the particulars...

The rest of the chapter is locked

You're reading from Applied Machine Learning and High-Performance Computing on AWS Accelerate the development of machine learning applications following architectural best practices

Table of Contents (20) Chapters

Executing a distributed training workload on AWS

Unlock this book and the full library FREE for 7 days

Authors (4)

Personalised recommendations for you