GPUs are also designed for matrix multiplication
NVIDIA GPUs, for example, contain tensor cores that accelerate matrix operations. A significant proportion of artificial intelligence algorithms use matrix operations, including transformer models. NVIDIA GPUs contain a goldmine of hardware optimization for matrix operations. The following links provide more information:
- https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/
- https://www.nvidia.com/en-us/data-center/tesla-p100/
Google’s Tensor Processing Unit (TPU) is the equivalent of NVIDIA’s GPUs. TensorFlow will optimize the use of tensors when using TPUs.
- For more on TPUs, see https://cloud.google.com/tpu/docs/tpus.
- For more on tensors in TensorFlow, see https://www.tensorflow.org/guide/tensor.
BERTBASE (110M parameters) was initially trained with 16 TPU chips. BERTLARGE (340M parameters) was trained with 64 TPU chips. For more on training...