SageMaker Training and Debugging Solutions
In Chapter 2, Deep Learning AMIs, and Chapter 3, Deep Learning Containers, we performed our initial ML training experiments inside EC2 instances. We took note of the cost per hour of running these EC2 instances as there are some cases where we would need to use the more expensive instance types (such as the p2.8xlarge
instance at approximately $7.20 per hour) to run our ML training jobs and workloads. To manage and reduce the overall cost of running ML workloads using these EC2 instances, we discussed a few cost optimization strategies, including manually turning off these instances after the training job has finished.
At this point, you might be wondering if it is possible to automate the following processes:
- Launching the EC2 instances that will run the ML training jobs
- Uploading the model artifacts of the trained ML model to a storage location (such as an S3 bucket) after model training
- Deleting the EC2 instances once...