Two-Phase Model Serving
In this chapter, we will discuss the two-phase prediction pattern. In the two-phase prediction pattern, we deploy two different models. The bigger and more complex model is deployed on the server. In most cases, the users of this model are edge devices where the network may fluctuate. So, in the case of bad network access, an edge device can use a lightweight model to get predictions for basic use cases. For broader and more accurate predictions, the devices can get the prediction by calling APIs to the model deployed to the server. We will discuss the serving of models in this scenario of edge devices that exist in unstable networking conditions.
We will cover the following topics in this chapter:
- Introducing two-phase model serving
- Exploring two-phase model serving techniques
- Use cases of two-phase model serving