Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Artificial Intelligence with Python Cookbook

You're reading from   Artificial Intelligence with Python Cookbook Proven recipes for applying AI algorithms and deep learning techniques using TensorFlow 2.x and PyTorch 1.6

Arrow left icon
Product type Paperback
Published in Oct 2020
Publisher Packt
ISBN-13 9781789133967
Length 468 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Ritesh Kumar Ritesh Kumar
Author Profile Icon Ritesh Kumar
Ritesh Kumar
Ben Auffarth Ben Auffarth
Author Profile Icon Ben Auffarth
Ben Auffarth
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Getting Started with Artificial Intelligence in Python 2. Advanced Topics in Supervised Machine Learning FREE CHAPTER 3. Patterns, Outliers, and Recommendations 4. Probabilistic Modeling 5. Heuristic Search Techniques and Logical Inference 6. Deep Reinforcement Learning 7. Advanced Image Applications 8. Working with Moving Images 9. Deep Learning in Audio and Speech 10. Natural Language Processing 11. Artificial Intelligence in Production 12. Other Books You May Enjoy

Forecasting CO2 time series

In this recipe, we will test out some well-known models (ARIMA, SARIMA) and signal decomposition by forecasting using Facebook's Prophet library on the time series data, in order to check their performance at forecasting our time series of CO2 values.

Getting ready

In order to prepare for this recipe, we'll install libraries and download a dataset.

We'll use the statsmodels package and prophet:

pip install statsmodels fbprophet

We will analyze the CO2 concentration data in this recipe. You can see the data loading in the notebook on GitHub accompanying this recipe, or in the scikit-learn Gaussian process regression (GPR) example regarding Mauna Loa CO2 data: https://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gpr_co2.html#sphx-glr-auto-examples-gaussian-process-plot-gpr-co2-py

This dataset is one of the earliest recordings available on atmospheric recordings of CO2. As it will be later observed, this data follows a sinusoidal pattern, with the CO2 concentration rising in winter and falling in the summer owing to the decreasing quantity of plants and vegetation in the winter season:

X,y = load_mauna_loa_atmospheric_co2()

The dataset contains the average CO2 concentration measured at Mauna Loa Observatory in Hawaii from 1958 to 2001. We will model the CO2 concentration with respect to that.

How to do it...

Now we'll get to forecasting our time series of CO2 data. We'll first explore the dataset, and then we'll apply the ARIMA and SARIMA techniques.

  1. Let's have a look at the time series:
df_CO2 = pd.DataFrame(data = X, columns = ['Year'])
df_CO2['CO2 in ppm'] = y
lm = sns.lmplot(x='Year', y='CO2 in ppm', data=df_CO2, height=4, aspect=4)
fig = lm.fig
fig.suptitle('CO2 conc. mauna_loa 1958-2001', fontsize=12)

Here's the graph:

The script here shows the time series seasonal decomposition of the CO2 data, showing a clear seasonal variation in the CO2 concentration, which can be traced back to the biology:

import statsmodels.api as stmd
d = stm.datasets.co2.load_pandas()
co2 = d.data
co2.head()
y = co2['co2']
y = y.fillna(
y.interpolate()
) # Fill missing values by interpolation

Now that we have preprocessed data for decomposition, let's go ahead with it:

from pylab import rcParams
rcParams['figure.figsize'] = 11, 9
result = stm.tsa.seasonal_decompose(y, model='additive')
pd.plotting.register_matplotlib_converters()
result.plot()
plt.show()

Here, we see the decomposition: the observed time series, its trend, seasonal components, and what remains unexplained, the residual element:

Now, let's analyze the time series.

Analyzing time series using ARIMA and SARIMA

We will fit ARIMA and SARIMA models to the dataset.

We'll define our two models and apply them to each point in the test dataset. Here, we iteratively fit the model on all the points and predict the next point, as a one-step-ahead.

  1. First, we split the data:
# taking a 90/10 split between training and testing:
future = int(len(y) * 0.9)
print('number of train samples: %d test samples %d' (future, len(y)-future)
)
train, test = y[:future], y[future:]

This leaves us with 468 samples for training and 53 for testing.

  1. Next, we define the models:
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX

def get_arima_model(history, order=(5, 1, 0)):
return ARIMA(history, order=order)

def get_sarima_model(
history,
order=(5, 1, 1),
seasonal_order=(0, 1, 1, 4)
):
return SARIMAX(
history,
order=order,
enforce_stationarity=True,
enforce_invertibility=False,
seasonal_order=seasonal_order
)
  1. Then we train the models:
from sklearn.metrics import mean_squared_error

def apply_model(train, test, model_fun=get_arima_model):
'''we just roll with the model and apply it to successive
time steps
'''
history = list(train)
predictions = []
for t in test:
model = model_fun(history).fit(disp=0)
output = model.forecast()
predictions.append(output[0])
history.append(t)
error = mean_squared_error(test, predictions)
print('Test MSE: %.3f' % error)
#print(model.summary().tables[1])
return predictions, error

predictions_arima, error_arima = apply_model(train, test)
predictions_sarima, error_sarima = apply_model(
train, test, get_sarima_model
)

We get an MSE in the test of 0.554 and 0.405 for ARIMA and SARIMA models, respectively. Let's see how the models fit graphically:

We could do a parameter exploration using the Akaike information criterion (AIC), which expresses the quality of the model relative to the number of parameters in the model. The model object returned by the fit function in statsmodels includes the AIC, so we could do a grid search over a range of parameters, and then select the model that minimizes the AIC.

How it works...

Time series data is a collection of observations x(t), where each data point is recorded at time t. In most cases, time is a discrete variable, that is, . We are looking at forecasting, which is the task of predicting future values based on the previous observations in the time series.

In order to explain the models that we've used, ARIMA and SARIMA, we'll have to go step by step, and explain each in turn:

  • Autoregression (AR)
  • Moving Average (MA)
  • Autoregressive Moving Average (ARMA)
  • Autoregressive Integrated Moving Average (ARIMA) and
  • Seasonal Autoregressive Integrated Moving Average (SARIMA)

ARIMA and SARIMA are based on the ARMA model, which is an autoregressive moving average model. Let's briefly go through some of the basics.

ARMA is a linear model, defined in two parts. First, the autoregressive linear model:

Here, are parameters and is a constant, is white noise, and is the order of the model (or the window size of the linear model). The second part of ARMA is the moving average, and this is again a linear regression, but of non-observable, lagged error terms, defined as follows:

Here, is the order of the moving average, are the parameters, and the expectation or the mean of the time series . The ARMA(p, q) model is then the composite of both of these models, AR(p) and MA(q):

The fitting procedure is a bit involved, particularly because of the MA part. You can read up on the Box-Jenkins method on Wikipedia if you are interested: https://en.wikipedia.org/wiki/Box%E2%80%93Jenkins_method

There are a few limitations to note, however. The time series has to be the following:

  • Stationary: Basically, mean and covariance across observations have to be constant over time.
  • Nonperiodic: Although bigger values for p and q could be used to model seasonality, it is not part of the model.
  • Linear: Each value for can be modeled as a linear combination of previous values and error terms.

There are different extensions of ARMA to address the first two limitations, and that's where ARIMA and SARIMA come in.

ARIMA (p, d, q) stands for autoregressive integrated moving average. It comes with three parameters:

  • p: The number of autoregressive terms (autoregression)
  • d: The number of non-seasonal differences needed for stationarity (integration)
  • q: The number of lagged forecast errors (moving average)

The integration refers to differencing. In order to stabilize the mean, we can take the difference between consecutive observations. This can also remove a trend or eliminate seasonality. It can be written as follows:

This can be repeated several times, and this is what the parameter d describes that ARIMA comes with. Please note that ARIMA can handle drifts and non-stationary time series. However, it is still unable to handle seasonality.

SARIMA stands for seasonal ARIMA, and is an extension of ARIMA in that it also takes into account the seasonality of the data.

SARIMA(p, d, q)(P, D, Q)m contains the non-seasonal parameters of ARIMA and additional seasonal parameters. Uppercase letters P, D, and Q annotate the seasonal moving average and autoregressive components, where m is the number of periods in each season. Often this is the number of periods in a year; for example m=4 would stand for a quarterly seasonal effect, meaning that D stands for seasonal differencing between observations Xt and Xt-m, and P and Q stand for linear models with backshifts of m.

In Python, the statsmodels library provides a method to perform the decomposition of the signal based on the seasonality of the data.

There's more...

Prophet is a library provided by Facebook for forecasting time series data. It works on an additive model and fits non-linear models. The library works best when the data has strong seasonal effects and has enough historic trends available.

Let's see how we can use it:

from fbprophet import Prophet

train_df = df_CO2_fb['1958':'1997']
test_df = df_CO2_fb['1998':'2001']
train_df = train_df.reset_index()
test_df = test_df.reset_index()Co2_model= Prophet(interval_width=0.95)

Co2_model.fit(train_df)
train_forecast = Co2_model.predict(train_df)
test_forecast = Co2_model.predict(test_df)
fut = Co2_model.make_future_DataFrame(periods=12, freq='M')
forecast_df = Co2_model.predict(fut)
Co2_model.plot(forecast_df)

Here are our model forecasts:

We get a similar decomposition as before with the ARIMA/SARIMA models, namely, the trend and the seasonal components:

The yearly variation nicely shows the rise and fall of the CO2 concentration according to the seasons. The trend clearly goes up over time, which could be worrisome if you think about global warming.

See also

We've used the following libraries in this recipe:

There are many more interesting libraries relating to time series, including the following:

You have been reading a chapter from
Artificial Intelligence with Python Cookbook
Published in: Oct 2020
Publisher: Packt
ISBN-13: 9781789133967
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime