Advanced ML Engineering
Congratulations on making it so far! By now, you should have developed a good understanding of the core fundamental skills that an ML solutions architect needs in order to operate effectively across the ML lifecycle. In this chapter, we will delve into advanced ML concepts. Our focus will be on exploring a range of options for distributed model training for large models and datasets. Understanding the concept and techniques for distributed training is becoming increasingly important as all large-scale model training such as GPT will require distributed training architecture. Furthermore, we’ll delve into diverse technical approaches aimed at optimizing model inference latency. As model sizes grow larger, having a good grasp on how to optimize models for low-latency inference is becoming an essential skill in ML engineering. Lastly, we will close this chapter with a hands-on lab on distributed model training.
Specifically, we will cover the following...