Managing deployed models on Vertex AI
When we deploy an ML model to an endpoint, we associate it with physical resources (compute) so that it can serve online predictions at low latency. Depending on the requirements, we might want to deploy multiple models to a single endpoint or a single model to multiple endpoints as well. Let’s learn about these two scenarios.
Multiple models – single endpoint
Suppose we already have one model deployed to an endpoint in production and we have found some interesting ideas to improve that model. Now, suppose we have already trained an improved model that we want to deploy but we also don’t want to make any sudden changes to our application. In this situation, we can add our latest model to the existing endpoint and start serving a very small percentage of traffic with the new model. If everything looks great, we can gradually increase the traffic until it is serving the full 100% of the traffic.