Converting a full model to an integer quantization model
This strategy requires TensorFlow 2.3. This quantization strategy is suitable for an environment where compute resources are really constrained, or where the compute node only operates in integer mode, such as edge devices or TPUs. As a result, all parameters are changed to int8
representation. This quantization strategy will try to use int8
representation for all ops or operations as the goal. When this is not possible, the ops are left as the original precision (in other words, float32
).
This quantization strategy requires some representative data. This data represents the type of data that the model typically expects in terms of a range of values. In other words, we need to provide either some training or validation data to the integer quantization process. This may be the data already used, such as a subset of the training or validation data. Usually, around 100 samples are recommended. We are going to use 80 samples...