Analyzing the results
As mentioned previously, when creating a customization job, we provide an output S3 path, where the metrics and logs are stored by the training job. You will see the step_wise_training_metrics.csv
and validation_metrics.csv
files inside the S3 output path. Within these files, you will see information such as the step number, epoch number, loss, and perplexity. You will see these details in both the training and validation sets. Although providing a validation set is optional, doing so allows the performance metrics of the custom model that’s been created to be evaluated.
Depending on the size of the dataset, you can decide how much of the validation dataset you would like to hold. If your dataset is small (for example, it contains hundreds or thousands of records), you can use 90% as the training set and 10% as the validation set. If your dataset size is large (for example, it contains hundreds of thousands of records), you can reduce the validation...