Network quantization – reducing the number of bits used for model parameters
If we look at DL model training in detail, you will notice that the model learns to deal with noisy inputs. In other words, the model tries to construct a generalization for the data it is trained with so that it can generate reasonable predictions even with some noise in the incoming data. Additionally, the DL model ends up using a particular range of numeric values for inference after the training. Following this line of thought, network quantization aims to use simpler representations for these values.
As shown in Figure 10.1, network quantization, also called model quantization, is the process of remapping a range of numeric values that the model interacts with to a number system that can be represented with fewer bits – for example, using 8 bits instead of 32 bits to represent a float. Such modifications pose an additional advantage in DL model deployment as edge devices are often missing...