Creating a batch inference pipeline
In Chapter 11, Working with Pipelines, you learned how to create pipelines that orchestrate multiple steps. These pipelines can be invoked using a REST API, similar to the real-time endpoint that you created in the previous section. One key difference is that in the real-time endpoint, the infrastructure is constantly on, waiting for a request to arrive, while in the published pipelines, the cluster will spin up only after the pipeline has been triggered.
You could use these pipelines to orchestrate batch inference on top of data residing in a dataset. For example, let's imagine that you just trained the loans
model you have been using in this chapter. You want to run the model against all of the pending loan requests and store the results; this is so that you can implement an email campaign targeting the customers that might get their loan rejected. The easiest approach is to create a single PythonScriptStep
that will process each record...