Introducing two-phase model serving
In this section, we will discuss the basic concepts related to the two-phase model serving pattern.
The two-phase model serving pattern deploys two models for prediction. One large model is deployed on the distributed server, and one small model is deployed on the edge device. The large model is usually beyond the memory limit of the edge device and thus can’t be deployed there. The smaller model deployed on the edge device is called the phase one model. The large model deployed on the cloud is known as the phase two model. This model is large and updated frequently to provide the most accurate predictions.
Two-phase model serving is very important when edge devices are involved in the overall system, and the predictions on these edge devices are essential, irrespective of the network conditions.
The phase one model is used for making predictions for two main reasons:
- To provide predictions if the device is offline. These...