Upsampling time series data
In upsampling, the frequency of the time series is increased. As a result, we have more sample points than data points. One of the main questions is how to account for the entries in the series where we have no measurement.
Let's start with hourly data for a single day:
>>> rng = pd.date_range('4/29/2015 8:00', periods=10, freq='H') >>> ts = pd.Series(np.random.randint(0, 100, len(rng)), index=rng) >>> ts.head() 2015-04-29 08:00:00 30 2015-04-29 09:00:00 27 2015-04-29 10:00:00 54 2015-04-29 11:00:00 9 2015-04-29 12:00:00 48 Freq: H, dtype: int64
If we upsample to data points taken every 15 minutes, our time series will be extended with NaN
values:
>>> ts.resample('15min') >>> ts.head() 2015-04-29 08:00:00 30 2015-04-29 08:15:00 NaN 2015-04-29 08:30:00 NaN 2015-04-29 08:45:00 NaN 2015-04-29 09:00:00 27
There are various ways to deal with missing values, which can be controlled by the fill_method...