Model serving and inferencing
Model serving and inferencing is the most important step of the entire ML life cycle. This is where the models that have been build are deployed to business applications so that we can draw inferences from them. Model serving and inferencing can happen in two ways: using batch processing in offline mode or in real time in online mode.
Offline model inferencing
Offline model inferencing is the process of generating predictions from a ML model using batch processing. The batch processing inference jobs run periodically on a recurring schedule, producing predictions on a new set of fresh data every time. These predictions are then stored in a database or on the data lake and are consumed by business applications in an offline or asynchronous way. An example of batch inferencing would be data-driven customer segmentation being used by the marketing teams at an organization or a retailer predicting customer lifetime value. These use cases do not demand...