In real-world production scenarios, we typically come across two situations:
- Running inferences in real-time or in online mode
- Running inferences in batch or in offline mode
To illustrate this, in the case of using a recommender system as part of a web/mobile app, real-time inferences can be used when you want to personalize item suggestions based on in-app activity. The in-app activity, such as items you browsed, items left in your shopping cart and not checked out, and so on, can be sent as input to an online recommender system.
On the other hand, if you want to present item suggestions to your customers even before they engage with your web/mobile app, then you can send data related to their historical consumption behavior to an offline recommender system so that you can obtain item suggestions for your entire customer base...