Resampling a time series
Time series resampling is the process of changing the frequency of a time series, for example, from hourly to daily. This task is a common preprocessing step in time series analysis and this recipe shows how to do it with pandas
.
Getting ready
Changing the frequency of a time series is a common preprocessing step before analysis. For example, the time series used in the preceding recipes has an hourly granularity. Yet, our goal may be to study daily variations. In such cases, we can resample the data into a different period. Resampling is also an effective way of handling irregular time series – those that are collected in irregularly spaced periods.
How to do it…
We’ll go over two different scenarios where resampling a time series may be useful: when changing the sampling frequency and when dealing with irregular time series.
The following code resamples the time series into a daily granularity:
series_daily = series.resample('D').sum()
The daily granularity is specified with the input D
to the resample
() method. The values of each corresponding day are summed together using the sum
()
method.
Most time series analysis methods work under the assumption that the time series is regular; in other words, it is collected in regularly spaced time intervals (for example, every day). But some time series are naturally irregular. For instance, the sales of a retail product occur at arbitrary timestamps as customers arrive at a store.
Let us simulate sale events with the following code:
import numpy as np import pandas as pd n_sales = 1000 start = pd.Timestamp('2023-01-01 09:00') end = pd.Timestamp('2023-04-01') n_days = (end – start).days + 1 irregular_series = pd.to_timedelta(np.random.rand(n_sales) * n_days, unit='D') + start
The preceding code creates 1000
sale events from 2023-01-01 09:00
to 2023-04-01
. A sample of this series is shown in the following table:
ID |
Timestamp |
1 |
2023-01-01 15:18:10 |
2 |
2023-01-01 15:28:15 |
3 |
2023-01-01 16:31:57 |
4 |
2023-01-01 16:52:29 |
5 |
2023-01-01 23:01:24 |
6 |
2023-01-01 23:44:39 |
Table 1.2: Sample of an irregular time series
Irregular time series can be transformed into a regular frequency by resampling. In the case of sales, we will count how many sales occurred each day:
ts_sales = pd.Series(0, index=irregular_series) tot_sales = ts_sales.resample('D').count()
First, we create a time series of zeros based on the irregular timestamps (ts_sales
). Then, we resample this dataset into a daily frequency (D
) and use the count
()
method to count how many observations occur each day. The tot_sales
reconstructed time series can be used for other tasks, such as forecasting daily sales.
How it works…
A sample of the reconstructed time series concerning solar radiation is shown in the following table:
Datetime |
Incoming Solar |
2007-10-01 |
1381.5 |
2007-10-02 |
3953.2 |
2007-10-03 |
3098.1 |
2007-10-04 |
2213.9 |
Table 1.3: Solar radiation time series after resampling
Resampling is a cornerstone preprocessing step in time series analysis. This technique can be used to change a time series into a different granularity or to convert an irregular time series into a regular one.
The summary statistic is an important input to consider. In the first case, we used sum
to add the hourly solar radiation values observed each day. In the case of the irregular time series, we used the count
()
method to count how many events occurred in each period. Yet, you can use other summary statistics according to your needs. For example, using the mean would take the average value of each period to resample the time series.
There’s more…
We resampled to daily granularity. A list of available options is available here: https://pandas.pydata.org/docs/user_guide/timeseries.html#dateoffset-objects.