Exploring different modes of serving ML models
In this section, we will consider how a model can be served for users (both humans and machines) to consume the ML service efficiently. Model serving is a critical area, which an ML system needs to succeed at to fulfill its business impact, as any lag or bug in this area can be costly in terms of serving users. Robustness, availability, and convenience are key factors to keep in mind while serving ML models. Let's take a look at some ways in which ML models can be served: this can be via batch service or on-demand mode (for instance, when a query is made on demand in order to get a prediction). A model can be served to either a machine or a human user in on-demand mode. Here is an example of serving a model to a user:
In a typical scenario (in on-demand mode), a model is served as a service for users to consume, as shown in Figure 12.2. Then, an external application...