You're reading from Deep Learning for Time Series Cookbook Use PyTorch and Python recipes for forecasting, classification, and anomaly detection

Product type Paperback

Published in Mar 2024

Publisher Packt

ISBN-13 9781805129233

Length 274 pages

Edition 1st Edition

Languages

Python

Tools

PyTorch

Concepts

Data Governance

Authors (2):

Luís Roque

Vitor Cerqueira

View More author details

Table of Contents (12) Chapters

Preface

1. Chapter 1: Getting Started with Time Series FREE CHAPTER

2. Chapter 2: Getting Started with PyTorch

3. Chapter 3: Univariate Time Series Forecasting

4. Chapter 4: Forecasting with PyTorch Lightning

5. Chapter 5: Global Forecasting Models

6. Chapter 6: Advanced Deep Learning Architectures for Time Series Forecasting

7. Chapter 7: Probabilistic Time Series Forecasting

8. Chapter 8: Deep Learning for Time Series Classification

9. Chapter 9: Deep Learning for Time Series Anomaly Detection

10. Index

Why subscribe?

11. Other Books You May Enjoy

Dealing with missing values

In this recipe, we’ll cover how to impute time series missing values. We’ll discuss different methods of imputing missing values and the factors to consider when choosing a method. We’ll show an example of how to solve this problem using pandas.

Getting ready

Missing values are an issue that plagues all kinds of data, including time series. Observations are often unavailable for various reasons, such as sensor failure or annotation errors. In such cases, data imputation can be used to overcome this problem. Data imputation works by assigning a value based on some rule, such as the mean or some predefined value.

How to do it…

We start by simulating missing data. The following code removes 60% of observations from a sample of two years of the solar radiation time series:

import numpy as np
sample_with_nan = series_daily.head(365 * 2).copy()
size_na=int(0.6 * len(sample_with_nan))
idx = np.random.choice(a=range(len(sample_with_nan)),
                       size=size_na,
                       replace=False)
sample_with_nan[idx] = np.nan

We leverage the np.random.choice() method from numpy to select a random sample of the time series. The observations of this sample are changed to a missing value (np.nan).

In datasets without temporal order, it is common to impute missing values using central statistics such as the mean or median. This can be done as follows:

average_value = sample_with_nan.mean()
imp_mean = sample_with_nan.fillna(average_value)

Time series imputation must take into account the temporal nature of observations. This means that the assigned value should follow the dynamics of the series. A more common approach in time series is to impute missing data with the last known observation. This approach is implemented in the ffill() method:

imp_ffill = sample_with_nan.ffill()

Another, less common, approach that uses the order of observations is bfill():

imp_bfill = sample_with_nan.bfill()

The bfill() method imputes missing data with the next available observation in the dataset.

How it works…

The following figure shows the reconstructed time series after imputation with each method:

Figure 1.2: Imputing missing data with different strategies

The mean imputation approach misses the time series dynamics, while both ffill and bfill lead to a reconstructed time series with similar dynamics as the original time series. Usually, ffill is preferable because it does not break the temporal order of observations, that is, using future information to alter (impute) the past.

There’s more…

The imputation process can also be carried out using some conditions, such as limiting the number of imputed observations. You can learn more about this in the documentation pages of these functions, for example, https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.ffill.html.