Exploring models with SageMaker Debugger
SageMaker Debugger lets you configure debugging rules for your training job. These rules will inspect its internal state and check for specific unwanted conditions that could be developing during training. SageMaker Debugger includes a long list of built-in rules (The loss not decreasing, vanishing gradients, and so on), and you can write your own
Here are some of the built-in rules: all_zero, class_imbalance, confusion, loss_not_decreasing, overfit, overtraining, similar_across_runs, stalled_training_rule, tensor_variance, and unchanged_tensor
. You can see the full list at https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-built-in-rules.html.
In addition, you can save and inspect the model state (gradients, weights, and so on)as well as the training state (metrics, optimizer parameters, and so on). At each training step, the tensors storing these values may be saved in near real time in an S3 bucket, making it possible to visualize...