Summary
In this chapter, we discussed advanced techniques on a hybrid of data parallelism and model parallelism. Next, we did a case study of Megatron-LM and its implementation, followed by a case study of Mesh-TensorFlow and its implementation. We ended the chapter by learning about the pros and cons of the two systems.
After reading this chapter, you should understand how Megatron-LM achieves both model parallelism and data parallelism simultaneously. You should be able to use Megatron-LM to launch your own DNN model training job. In addition, you should be familiar with the high-level idea of Mesh-TensorFlow and how to use it for model training as well.
In the next chapter, we will discuss federated learning.