Three generations of TPUs and Edge TPU
As discussed, TPUs are domain-specific processors expressly optimized for matrix operations. Now, you might remember that the basic operation of a matrix multiplication is a dot product between a line from one matrix and a column from the other matrix. For instance, given a matrix multiplication Y=X*W, computing Y[i,0] is:
The sequential implementation of this operation is time consuming for large matrices. A brute-force computation has time complexity of O(n3) for n x n matrices, so it's not feasible for running large computations.
First-generation TPU
The first-generation TPU (TPU v1) was announced in May 2016 at Google I/O. TPU v1 [1] supports matrix multiplication using 8-bit arithmetic. TPU v1 is specialized for deep learning inference but it does not work for training. For training there is a need to perform floating-point operations, as discussed in the following paragraphs.
A key function of TPU is the "systolic...