Understanding patterns of DL inference pipelines
As the model development enters the stage of implementing an inference pipeline for the upcoming production usage, it is important to understand that having a well-tuned and trained DL model is only half the success story for business AI strategy. The other half includes deploying, serving, monitoring, and continuously improving the model after it goes into production. Designing and implementing a DL inference pipeline is the initial step toward the second half of the story. While the model has been trained, tuned, and tested on curated offline datasets, now it needs to handle prediction in two ways:
- Batch inference: This usually requires some scheduled or ad hoc execution of an inference pipeline for some offline batch of observational data. The turnaround time for producing prediction results is daily, weekly, or other schedules.
- Online inference: This usually requires a web service for real-time execution of an inference...