TPU performance
Discussing performance is always difficult because it is important to first define the metrics that we are going to measure, and the set of workloads that we are going to use as benchmarks. For instance, Google reported an impressive linear scaling for TPU v2 used with ResNet-50 [4] (see Figure 15.9 and Figure 15.10):
Figure 15.9: Linear scalability in the number of TPUs v2 when increasing the number of images
In addition, you can find online a comparison of ResNet-50 [4] where a full Cloud TPU v2 Pod is >200x faster than a V100 NVIDIA Tesla GPU for ResNet-50 training:
Figure 15.10: A full Cloud TPU v2 Pod is >200x faster than a V100 NVIDIA Tesla GPU for training a ResNet-50 model
According to Google, TPU v4 givse top-line results for MLPerf1.0 [5] when compared with Nvidia A100 GPUs (see Figure 15.11). Indeed, these accelerators are designed by keeping in mind the latest large models encompassing billions and sometimes trillions of...