Reducing bits in hardware
A recent study shows that using fewer bits to represent model weights will not introduce significant model test accuracy. Given this observation, we can use fewer bits to represent each weight value inside a DNN model. A simple example is shown here:
As shown in Figure 8.13, we can reduce the bit representation from FP32 to FP16. We can further reduce the bits by moving from FP16 to INT8.