Ensemble Model Serving Pattern
In this chapter, we will discuss the ensemble model serving pattern. In the ensemble pattern, we combine the output from multiple models before serving a response to the client. This combination of responses from multiple sources is needed in many scenarios – for example, to get information about audio and video using separate models from a video file, and then combining that information to generate the final inference about the video. We can also combine the output from multiple similar models to make inferences with higher confidence. We will discuss some of these cases in this chapter. We will also explore a dummy end-to-end example of how we can combine multiple models to generate the final response.
At a high level, we are going to cover the following main topics in this chapter:
- Introducing the ensemble pattern
- Using ensemble pattern techniques
- End-to-end dummy example of serving the model