References
Please go through the following content for more information on a few topics covered in the chapter:
- aws-sample/sagemaker-ssh-helper: https://github.com/aws-samples/sagemaker-ssh-helper
- Use TensorBoard in Amazon SageMaker Studio: https://docs.aws.amazon.com/sagemaker/latest/dg/studio-tensorboard.html
- aws/amazon-sagemaker-examples: https://github.com/aws/amazon-sagemaker-examples/blob/main/training/distributed_training/pytorch/model_parallel/gpt2/train_gpt_simple.py
- Training large language models on Amazon SageMaker: Best practices: https://aws.amazon.com/blogs/machine-learning/training-large-language-models-on-amazon-sagemaker-best-practices/
- Introduction to SageMaker’s Distributed Data Parallel Library: https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-intro.html