Getting predictions on Vertex AI
In this section, we will learn how to get predictions from our ML models on Vertex AI. Depending on the use case, prediction requests can be of two types – online predictions (real time) and batch predictions. Online predictions are synchronous requests made to a model endpoint. Online predictions are needed by applications that keep requesting outputs for given inputs in a timely manner via an API call in order to update information for end users in near real time. For example, the Google Maps API gives us near real-time traffic updates and requires online prediction requests. Batch predictions, on the other hand, are asynchronous requests. If our use case only requires batch prediction, we might not need to deploy the model to an endpoint as the Vertex AI batchprediciton
service also allows us to perform batch prediction from a saved model that is present in a GCS location without even needing to create an endpoint. Batch predictions are suitable...