Summary
In this chapter, we have discussed two-phase model serving. We have explained what two-phase model serving is and why it is needed. We have also discussed different combinations of phase one and phase two models. We have seen that the phase one model can be created via quantization of the phase two model, which involves training only a single model. The phase one model can also be trained separately from the phase two model. These techniques are discussed along with some basic examples throughout the chapter. We have also discussed some examples of two-phase model serving.
In the next chapter, we will talk about the pipeline pattern. We will learn how ML pipelines are created, how different stages in the pipeline are interconnected to serve the model, and how the execution of the pipeline is scheduled.