Index
As this ebook edition doesn't have fixed pagination, the page numbers below are hyperlinked for reference only, based on the printed edition of this book.
A
all-reduce synchronization 133
Amazon Web Services (AWS) 162
AMP activation, on GPU 114
backend flags, enabling 115
gradient scaler 116, 117
training loop, wrapping with torch.autocast 115, 116
application layer, software stack 25
batch size, modifying 28-30
modifying 25, 26
practical example 26-28
application programming interface (API) 184
Apptainer 22
artificial intelligence (AI) 162
automatic mixed precision (AMP) 112
activating, on GPU 114
benefits 117
enabling 114
B
backend compiler 49
basic workflow, distributed training on PyTorch 135
checkpoint 138
communication group, destroying 137
communication group initialization 136
distributed data loader, instantiating 137
distributed model, instantiating 138
distributed...