Summary
In this chapter, you learned about the basics of distributed training. You learned about data parallelism and model parallelism as key concepts that will enable you to scale your training up to sizes that approach state of the art. You learned how to combine them, and especially how managed orchestration platforms such as Amazon SageMaker help you seamlessly work with hundreds to thousands of GPUs with optimized distributed training libraries. You then learned about advanced GPU memory reduction techniques and brought this to life with real-world examples such as Stable Diffusion and GPT-3.
In the next chapter, we’ll dive into the engineering fundamentals and concepts you need to build your own data loader!