Generating predictions using the recently trained model
Once the AutoML model is trained, you can generate predictions using one of the following two methods:
- Batch predictions: As the name suggests, batch predictions are asynchronous predictions generated for a batch of inputs. This is used when a real-time response is unnecessary and you want to submit a single request to process many data instances. In Vertex AI, a request for batch predictions can be submitted directly to a model residing in the Vertex AI Model registry, without the need to deploy it on an endpoint.
- Online predictions: If you need real-time inference – for example, when responding to application input – you need to use the Vertex AI online prediction option. To use online prediction, you must first deploy the model to an endpoint. This step provisions infrastructure resources and deploys prediction serving mechanism using the specified model, enabling it to serve predictions with low latency...