Converting a full model to a reduced hybrid quantization model
In the previous section, we converted a full model into a reduced float16
TFLite model, and demonstrated its scoring and evaluation processes. Now we will try the second type of supported quantization, which is a hybrid approach.
Hybrid quantization optimizes the model by converting the model to 8-bit integer weights, 32-bit float biases, and activations. Since it contains both integer and floating-point computations, it is known as hybrid quantization. This is intended for a trade-off between accuracy and optimization.
There is only one small difference that we need to make for hybrid quantization. There is only one line of difference, as explained below. In the previous section, this is how we quantized the full model to a reduced float16
TFLite model:
converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.target_spec.supported_types = [tf.float16] tflite_model = converter.convert()
For hybrid quantization...