Performance of models
Performance is important for both the training and the deployment of deep learning models. The training usually takes more time due to large data or big model architecture. The resultant models may be a bigger size and hence problematic to use in mobile devices where there is a constraint on RAM. More computing time results in more infrastructure cost. The inference time is critical in video applications. Due to the previously mentioned importance of performance, in this section, we will look at techniques to improve the performance. Reducing the model complexity is an easy option but results in decreasing accuracy. Here, we will focus on methods to improve the performance with an insignificant drop in accuracy. In the next section, we will discuss the option of quantization.
Quantizing the models
The weights of deep learning models have 32-bit float values. When the weights are quantized to 8-bit, the decrease in accuracy is small and hence cannot be noticed in deployment...