Summary
In this chapter, we learned to optimize a trained model by making it smaller and therefore more compact. Therefore, we have more flexibility when it comes to deploying these models in various hardware or resource constrained conditions. Optimization is important for model deployment in a resource constrained environment such as edge devices with limited compute, memory, or power resources. We achieved model optimization by means of quantization, where we reduced the model footprint by altering the weight, biases, and activation levels' data type.
We learned about three quantization strategies: reduced float16
, hybrid quantization, and integer quantization. Of these three strategies, integer quantization currently requires an upgrade to TensorFlow 2.3.
Choosing a quantization strategy depends on factors such as target compute, resource, model size limit, and model accuracy. Furthermore, you have to consider whether or not the target hardware requires integer ops...