A Tensor Processing Unit (TPU) is an application-specific integrated circuit (ASIC) that implements hardware circuits optimized for the computation requirements of deep neural networks. A TPU is based on a Complex Instruction Set Computer (CISC) instruction set that implements high-level instructions for running complex tasks for training deep neural networks. The heart of the TPU architecture resides in the systolic arrays that optimize the matrix operations.
The Architecture of TPU
Image from: https://cloud.google.com/blog/big-data/2017/05/images/149454602921110/tpu-15.png
TensorFlow provides a compiler and software stack that translates the API calls from TensorFlow graphs into TPU instructions. The following block diagram depicts the architecture of TensorFlow models running on top of the TPU stack:
Image from: https://cloud.google.com/blog/big-data...