Distribution Fundamentals
In this chapter, you’ll learn conceptual fundamentals for the distribution techniques you need to employ for large-scale pretraining and fine-tuning. First, you’ll master top distribution concepts for machine learning (ML), notably model and data parallel. Then, you’ll learn how Amazon SageMaker integrates with distribution software to run your job on as many GPUs as you need. You’ll learn how to optimize model and data parallel for large-scale training, especially with techniques such as sharded data parallelism. Then, you’ll learn how to reduce your memory consumption with advanced techniques such as optimizer state sharding, activation checkpointing, compilation, and more. Lastly, we’ll look at a few examples across language, vision, and more to bring all of these concepts together.
In this chapter, we’re going to cover the following main topics:
- Understanding key concepts—data and model...