Summary
In this chapter, we mainly discussed how to improve system efficiency in model-parallel training and serving.
After reading this chapter, you should be able to freeze some layers during model-parallel training. In addition, you can also use CPU memory or disk as the GPU's external data storage. Furthermore, you should have also mastered techniques such as model decomposition, model distillation, and reduced bit representation.
In the next chapter, we will discuss advanced techniques such as combining data parallelism and model parallelism.