Distributed training code
In this section, we will learn how to write code to perform distributed training using the PyTorch framework for vision-based deep learning algorithms. We will be using Python code to create the model and then train it with a compute cluster. All the code is available in this book’s GitHub repository for learning and execution purposes.
Creating a training job Python file to process
Follow these steps to create a dataset while leveraging the user interface:
- Go to https://ml.azure.com and select your workspace.
- Go to Compute and click Start to start the compute instance.
- Wait for the compute instance to start; then, click Jupyter to start coding.
- If you don’t have a compute cluster, please follow the instructions in the previous chapters to create a new one. A compute instance with a CPU is good for development; we will use GPU-based content for model training.
- If you don’t have enough quotas for your GPU,...