Top usability features for SageMaker training
Now that you have some sense of how to integrate your scripts with SageMaker training, let’s learn about a few key aspects of SageMaker that make it especially easy and fun to work with.
Warm pools for rapid experimentation
Once your SageMaker job is online, it moves through the following phases:
- Initializing resources
- Downloading your data
- Downloading your training image
- Invoking your main script
- Uploading the model artifact to S3 on completion
You might be wondering, what happens if my job breaks and I need to update a few lines of code? Do I need to completely restart the entire cluster from scratch?
Fortunately for you, the answer is no! Definitely not. You can use managed warm pools. Just add one extra hyperparameter, keep_alive_period_in_seconds
, and it’ll keep your job online even after your script either fails or finishes completely. This is useful because, in many cases, that...