Implementing adaptive model training in the cloud
Here, we discuss how to implement adaptive model training using PyTorch on AWS.
First, we need to install the corresponding Python packages:
# installation pip3 -m pip install adaptdl
Once the package is successfully installed, we can use it for adaptive and distributed DNN training, as follows:
#import package import adaptdl # Initialize process group adaptdl.torch.init_process_group("MPI") # Wrap model to adaptdl version model = adaptdl.torch.AdaptiveDataParallel(model, optimizer) # Wrap data loader to adaptdl version dataloader = adaptdl.torch.AdaptiveDataLoader(dataset, batch_size = 128) # Start adaptive DNN training remaining_epoch = 200 epoch = 0 for epoch in adaptdl.torch.remaining_epochs_until(remaining_epochs) ... train(model) ...
Basically, we need to wrap both the model and input data with the adaptdl
version. Then, we can conduct the normal DNN training and adaptdl
will handle how to conduct the...