Reducing bits in hardware
A recent study shows that using fewer bits to represent model weights will not introduce significant model test accuracy. Given this observation, we can use fewer bits to represent each weight value inside a DNN model. A simple example is shown here:
Figure 8.13 – Reducing bit representation per model weight
As shown in Figure 8.13, we can reduce the bit representation from FP32 to FP16. We can further reduce the bits by moving from FP16 to INT8.