Validation strategies for datasets with multiple time series
All the strategies we have seen until now are perfectly valid for datasets with multiple time series, such as the London Smart Meters dataset we have been working with in this book. The insights we discussed in the last section are also valid. The implementation of such strategies can be slightly tricky because the scikit-learn classes we discussed work for a single time series. Those implementations assume that we have a single time series, sorted according to the temporal order. If there are multiple time series, the splits will be haphazard and messy.
There are a couple of options we can adopt for datasets with multiple time series:
- We can loop over the different time series and use the methods we discussed to do the train-validation split, and then concatenate the resulting sets across all the time series. But that is not going to be so efficient.
- We can write some code and design the validation...