Summary
TPUs are very special ASIC chips developed at Google for executing neural network mathematical operations in an ultra-fast manner. The core of the computation is a systolic multiplier that computes multiple dot products (row * column) in parallel, thus accelerating the computation of basic deep learning operations. Think of a TPU as a special-purpose coprocessor for deep learning, which is focused on matrix or tensor operations. Google has announced three generations of TPUs so far, plus an additional Edge TPU for IoT. Cloud TPU v1 is a PCI-based specialized co-processor, with 92 TeraFLOPS and inference only. Cloud TPU v2 achieves 180 TeraFLOPS and it supports training and inference. Cloud TPU v2 pods released in alpha in 2018 can achieve 11.5 PetaFLOPS. Cloud TPU v3 achieves 420 TeraFLOPS with both training and inference support. Cloud TPU v3 pods can deliver more than 100 PetaFLOPS of computing power. That's a world-class supercomputer for tensor operations!