Quantizing the model with the TensorFlow Lite converter
The TensorFlow model produced in the previous recipe is well suited for sharing or resuming training sessions. However, the model cannot be used for a microcontroller deployment because of its high memory requirements, which are mainly due to the following reasons:
- The weights are stored in floating-point format
- It keeps information that’s not required for the inference
Since our target device has computational and memory constraints, it is crucial to transform the trained model into something more compact.
This recipe will teach you how to convert the trained model into a lightweight format with the help of TensorFlow Lite and post-training integer 8-bit quantization.
Getting ready
TensorFlow Lite and post-training integer 8-bit quantization are the main ingredients that make the trained model suitable for inference on devices with reduced memory computational capabilities...