Training at scale
When we introduced Ray in Chapter 6, Scaling Up, we mentioned use cases where the data or processing time requirements were such that using a very scalable parallel computing framework made sense. What was not made explicit is that sometimes these requirements come from the fact that we actually want to train many models, not just one model on a large amount of data or one model more quickly. This is what we will do here.
The retail forecasting example we described in Chapter 1, Introduction to ML Engineering uses a data set with several different retail stores in it. Rather than creating one model that could have a store number or identifier as a feature, a better strategy would perhaps be to train a forecasting model for each individual store. This is likely to give better accuracy as the features of the data at the store level which may give some predictive power will not be averaged out by the model looking at a combination of all the stores together. This...