Performing distributed training in SageMaker Studio
As the field of deep learning advances, ML models and training data are growing to a point that one single device is no longer sufficient for conducting effective model training. The neural networks are getting deeper and deeper, and gaining more and more parameters for training:
- LeNet-5, one of the first Convolutional Neural Network (CNN) models proposed in 1989 that uses 2 convolutional layers and 3 dense layers, has around 60,000 trainable parameters.
- AlexNet, a deeper CNN architecture with 5 layers of convolutional layers and 3 dense layers proposed in 2012, has around 62 million trainable parameters.
- Bidirectional Transformers for Language Understanding (BERT), a language representation model using a transformer proposed in 2018, has 110 million and 340 million trainable parameters in the base and large models respectively.
- Generative Pre-trained Transformer 2 (GPT-2), a large transformer-based generative...