Optimizing for AI/ML at the edge
Serving ML models at the edge refers to running your models directly on user devices such as smartphones or IoT devices. The term “edge” is based on traditional network architecture terminology, in which the core of the network is in the network owner’s data centers, and the edge of the network is where user devices connect to the network. Running models and other types of systems at the edge can provide benefits such as lower latency, increased privacy, and reduced server costs. However, edge devices usually have limited computing power, so we may need to make some changes to our models for them to run efficiently on those devices. There are several things we can do to optimize our models to run at the edge, all of which we will discuss in this section.
Model optimization
Let’s start by discussing what kinds of measures we can take to optimize our models so that they can be used at the edge.
Model selection
First...