Google Colab Pro with a GPU
The VM activated with Google Colab provided an NVIDIA P100 GPU, as shown in Figure II.7. That was interesting because the original Transformer was trained with 8 NVIDIA P100s as stated in Vaswani et al.(2017), Attention is All you Need. It took 12 hours to train the base models with 106×65 parameters and with 8 GPUs:
![Table Description automatically generated with medium confidence](https://static.packt-cdn.com/products/9781803247335/graphics/Images/B17948_Appendix_II_07.png)
Figure II.7: The Google Colab Pro VM was provided with a P100 GPU
The training loop time was considerably reduced and lasted less than 10 minutes, as shown in Figure II.8:
![Text Description automatically generated](https://static.packt-cdn.com/products/9781803247335/graphics/Images/B17948_Appendix_II_08.png)
Figure II.8: Training loop with a P100 GPU
Join our book’s Discord space
Join the book’s Discord workspace for a monthly Ask me Anything session with the authors:
https://www.packt.link/Transformers