Google Colab Pro with a GPU
The VM activated with Google Colab provided an NVIDIA P100 GPU, as shown in Figure II.7. That was interesting because the original Transformer was trained with 8 NVIDIA P100s as stated in Vaswani et al.(2017), Attention is All you Need. It took 12 hours to train the base models with 106×65 parameters and with 8 GPUs:

Figure II.7: The Google Colab Pro VM was provided with a P100 GPU
The training loop time was considerably reduced and lasted less than 10 minutes, as shown in Figure II.8:

Figure II.8: Training loop with a P100 GPU
Join our book’s Discord space
Join the book’s Discord workspace for a monthly Ask me Anything session with the authors:
https://www.packt.link/Transformers