Summary
In this chapter, we mainly discussed how to implement a model parallel training and serving pipeline.
After reading this chapter, you should be able to split a DNN model into multiple GPUs and conduct model parallel training and serving. In addition, you should also know how to do hyperparameter tuning for model parallel training jobs. Finally, you can easily test your NLP model by running some model serving tasks.
In the next chapter, we will discuss some advanced techniques to further boost the performance of model parallel training and serving.