Summary
In this chapter, we discussed several advanced ML engineering topics, including distributed training for large-scale datasets and large models, as well as techniques and options for achieving low latency inference. Now, you should be able to talk about how data parallelism and model parallelism work, as well as the various technology options, such as the PyTorch distributed library and SageMaker Distributed Training library, for running data parallel and model parallel distribution training. You should also be able to talk about the different techniques you can use for model optimization to reduce model inference latency, as well as the model compiler tools for automated model optimization.
In the next chapter, we will talk about security and governance in ML.