Deploying a vision model to a Vertex AI endpoint
In the previous section, we completed our experiment of training a TF-based vision model to identify detects from product images. We now have a trained model that can identify defected or broken bangle images. To make this model usable in downstream applications, we need to deploy it to an endpoint so that we can query that endpoint, getting outputs for new input images on demand. There are certain things that are important to consider while deploying a model, such as expected traffic, expected latency, and expected cost. Based on these factors, we can choose the best infrastructure to deploy our models. If there are strict low-latency requirements, we can deploy our model to machines with accelerators (such as Graphical Processing Units (GPUs) or Tensor Processing Units (TPUs)). Conversely, we don’t have the necessity of online or on-demand predictions, so we don’t need to deploy our model to an endpoint. Offline batch...