You're reading from The Machine Learning Solutions Architect Handbook Create machine learning platforms to run solutions in an enterprise setting

Product type Paperback

Published in Jan 2022

Publisher Packt

ISBN-13 9781801072168

Length 442 pages

Edition 1st Edition

Languages

Python

Tools

TensorFlow

Concepts

Machine Learning

Author (1):

David Ping

View More author details

Table of Contents (17) Chapters

Preface

1. Section 1: Solving Business Challenges with Machine Learning Solution Architecture

2. Chapter 1: Machine Learning and Machine Learning Solutions Architecture FREE CHAPTER

3. Chapter 2: Business Use Cases for Machine Learning

4. Section 2: The Science, Tools, and Infrastructure Platform for Machine Learning

5. Chapter 3: Machine Learning Algorithms

6. Chapter 4: Data Management for Machine Learning

7. Chapter 5: Open Source Machine Learning Libraries

8. Chapter 6: Kubernetes Container Orchestration Infrastructure Management

9. Section 3: Technical Architecture Design and Regulatory Considerations for Enterprise ML Platforms

10. Chapter 7: Open Source Machine Learning Platforms

11. Chapter 8: Building a Data Science Environment Using AWS ML Services

12. Chapter 9: Building an Enterprise ML Architecture with AWS ML Services

13. Chapter 10: Advanced ML Engineering

14. Chapter 11: ML Governance, Bias, Explainability, and Privacy

15. Chapter 12: Building ML Solutions with AWS AI Services

16. Other Books You May Enjoy

Training large-scale models with distributed training

As ML algorithms continue to become more complex and the data that's available for ML gets increasingly large, model training can become a big bottleneck in the ML life cycle. Training models with large datasets on a single machine/device can become too slow or is simply not possible when the model is too large to fit into the memory of a single device. The following diagram shows how quickly language models have evolved in recent years and the growth in terms of model size:

Figure 10.1 – The growth of language models

To solve the challenges of training large models with large data, we can turn to distributed training. Distributed training allows you to train models across multiple devices on a single node or across multiple nodes so that you can split up the data or model across these devices and nodes for model training. There are two main types of distributed training: data parallelism and...

The rest of the chapter is locked

You're reading from The Machine Learning Solutions Architect Handbook Create machine learning platforms to run solutions in an enterprise setting

Table of Contents (17) Chapters

Training large-scale models with distributed training

Authors (1)

Personalised recommendations for you

You're reading from The Machine Learning Solutions Architect Handbook Create machine learning platforms to run solutions in an enterprise setting

Table of Contents (17) Chapters

Training large-scale models with distributed training

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you