There are two main modes of using machine learning models:
- Batch predictions: In this mode, you load a bunch of data records after a certain period—for example, every night or every month. You then make predictions for this data. Usually, latency is not an issue here, and you can afford to put your training and prediction code into single batch jobs. One exception to this is if you need to run your job too frequently that you do not have enough time to retrain the model every time the job runs. Then, it makes sense to train the model once, store it somewhere, and load it each time new batch predictions are to be made.
- Online predictions: In this model, your model is usually deployed behind anApplication Programming Interface (API). Your API is usually called with a single data record each time, and it is supposed to make predictions for this single record and return it. Having low latency is...