Chapter 12: Advanced Techniques for Further Speed-Ups
So far, we have discussed all the mainstream distributed Deep Neural Network (DNN) model training and inference methodologies. Here, we want to illustrate some advanced techniques that can be used along with all the previous techniques we have.
In this chapter, we will mainly cover advanced techniques that can be applied generally to DNN training and serving. More specifically, we will discuss general performance debugging approaches, such as kernel event monitoring, job multiplexing, and heterogeneous model training.
Before we discuss anything further, we will list the assumptions we have for this chapter, as follows:
- By default, we will use homogenous GPUs or other accelerators for model training and serving.
- For heterogeneous model training and inference, we will use heterogeneous hardware accelerators for the same training/serving job.
- We have Windows Server so that we can directly use NVIDIA performance...