Summary
In this chapter, we delved into the advanced topics of ML engineering. We covered distributed training for handling extensive datasets and large-scale models, along with strategies for achieving low-latency inference. Hopefully, you now have a solid understanding of data parallelism and model parallelism, as well as the diverse technology choices available, such as the PyTorch distributed library and SageMaker distributed training library, for implementing distributed training using these approaches. Additionally, you should be well equipped to discuss various techniques for optimizing models to minimize inference latency, including the utilization of model compiler tools designed for automated model optimization.
So far, we have focused on training ML models from scratch and designing ML platforms for the training and deployment of ML models to support the development of intelligent applications. However, we don’t always need to build models from scratch. In the...