Resampling a multivariate time series
This recipe revisits the topic of resampling but focuses on multivariate time series. We’ll explain why resampling can be a bit tricky for multivariate time series due to the eventual need to use distinct summary statistics for different variables.
Getting ready
When resampling a multivariate time, you may need to apply different summary statistics depending on the variable. For example, you may want to sum up the solar radiation observed at each hour to get a sense of how much power you could generate. Yet, taking the average, instead of the sum, is more sensible when summarizing wind speed because this variable is not cumulative.
How to do it…
We can pass a Python dictionary that details which statistic should be applied to each variable. Then, we can pass this dictionary to the agg
()
method, as follows:
stat_by_variable = { 'Incoming Solar': 'sum', 'Wind Dir': 'mean', 'Snow Depth': 'sum', 'Wind Speed': 'mean', 'Dewpoint': 'mean', 'Precipitation': 'sum', 'Vapor Pressure': 'mean', 'Relative Humidity': 'mean', 'Air Temp': 'max', } data_daily = data.resample('D').agg(stat_by_variable)
We aggregate the time series into a daily periodicity using different summary statistics. For example, we want to sum up the solar radiation observed each day. For the air temperature variable (Air Temp
), we take the maximum value observed each day.
How it works…
By using a dictionary to pass different summary statistics, we can adjust the frequency of the time series in a more flexible way. Note that if you wanted to apply the mean for all variables, you would not need a dictionary. A simpler way would be to run data.resample('D').mean()
.