Model Training Optimizations
Before serving pre-trained machine learning models, which we will discuss extensively in Chapter 13, Operationalizing PyTorch Models into Production, we need to train them. In Chapters 2 to 6, we saw the vast expanse of increasingly complex deep learning model architectures. Such gigantic models often have millions and even billions of parameters. The recent (at the time of writing) Pathways Language Model (PaLM) can have up to 540 billion parameters, for example using backpropagation to tune these many parameters requires enormous amounts of memory and compute power. And even then, model training can take days to finish.
In this chapter, we will explore ways of speeding up the model training process by distributing the training task across machines and processes within machines. We will learn about the distributed training APIs offered by PyTorch – torch.distributed
, torch.multiprocessing
and torch.utils.data.distributed.DistributedSampler
...