Chapter 7: Model Optimization
In this chapter, we will learn about the concept of model optimization through a technique known as quantization. This is important because even though capacity, such as compute and memory, are less of an issue in a cloud environment, latency and throughput are always a factor in the quality and quantity of the model's output. Therefore, model optimization to reduce latency and maximize throughput can help reduce the compute cost. In the edge environment, many of the constraints are related to resources such as memory, compute, power consumption, and bandwidth.
In this chapter, you will learn how to make your model as lean and mean as possible, with acceptable or negligible changes in the model's accuracy. In other words, we will reduce the model size so that we can have the model running on less power and fewer compute resources without overly impacting its performance. In this chapter, we are going to take a look at recent advances and...