Four generations of TPUs, plus Edge TPU
As discussed, TPUs are domain-specific processors expressly optimized for matrix operations. Now, you might remember that the basic operation of matrix multiplication is a dot product between a line from one matrix and a column from the other matrix. For instance, given a matrix multiplication , computing Y[i, 0] is:
The sequential implementation of this operation is time-consuming for large matrices. A brute-force computation has a time complexity of O(n3) for n x n matrices so it’s not feasible for running large computations.
First generation TPU
The first generation TPU (TPU v1) was announced in May 2016 at Google I/O. TPU v1 [1] supports matrix multiplication using 8-bit arithmetic. TPU v1 is specialized for deep learning inference but it does not work for training. For training there is a need to perform floating-point operations, as discussed in the following paragraphs.
A key function of TPU is the “...