Summary
In this chapter, we have discussed the online model serving pattern. We have introduced you to the definition of online model serving and different modes of online model serving. We have seen that, in online model serving, we try to keep the model updated with newly available data.
We then introduced some example cases where online model serving would be essential. In those cases, the model cannot tolerate delays in adapting to new data. The purpose of the model may be compromised if we do not update the model regularly with newly arrived data.
We then discussed some of the challenges in online model serving and introduced some ideas for how to address those challenges. We concluded the chapter with a dummy end-to-end example of online model serving.
In the next chapter, we will talk about two-phase model serving where two parallel versions of a model (one heavy, one light) are used to enable serving on low-memory and low-network devices.