Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Machine Learning for Time-Series with Python

You're reading from   Machine Learning for Time-Series with Python Forecast, predict, and detect anomalies with state-of-the-art machine learning methods

Arrow left icon
Product type Paperback
Published in Oct 2021
Publisher Packt
ISBN-13 9781801819626
Length 370 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Ben Auffarth Ben Auffarth
Author Profile Icon Ben Auffarth
Ben Auffarth
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. Introduction to Time-Series with Python 2. Time-Series Analysis with Python FREE CHAPTER 3. Preprocessing Time-Series 4. Introduction to Machine Learning for Time-Series 5. Forecasting with Moving Averages and Autoregressive Models 6. Unsupervised Methods for Time-Series 7. Machine Learning Models for Time-Series 8. Online Learning for Time-Series 9. Probabilistic Models for Time-Series 10. Deep Learning for Time-Series 11. Reinforcement Learning for Time-Series 12. Multivariate Forecasting 13. Other Books You May Enjoy
14. Index

Change point detection

A common problem with time-series is changes in the behavior of the observed system. Generally speaking, a change point signals an abrupt and significant transition between states in the process generating the series. For example, the trend can suddenly change, and a change point can signal where the trend of the series changes. This is well known under the guise of technical chart pattern analysis in trading.

This list captures some applications for Change point detection (CPD):

  • Speech recognition: Detection of word and sentence boundaries
  • Image analysis: Surveillance on video footage of closed-circuit television
  • Fitness: Segmenting human activities over time based on data from motion sensors from smart devices such as watches or phones
  • Finance: Identifying changes to trend patterns that could indicate changes from bear to bull markets, or the other way around

As an example for the importance of CPD, consider the stock market. Time-series data that describes the evolution of a market, such as stock prices, follows trends – it either rises, falls, or doesn't change significantly (stagnation).

When a stock rises, the investor wants to buy the stock. Otherwise, when the stock is falling, the investors doesn't want to keep the stock, but rather to sell it. Not changing the position will cause a loss of book value – in the best case, this will cause a problem with liquidity.

For investors, it is therefore key to know, when the market changes from rising to falling or the other way around. Recognizing these changes can make the difference between winning or losing.

In forecasting, special events like Black Friday, Christmas, an election, a press release, or changes in regulation can cause short-term (perhaps then classed as an anomaly) or long-term change to the trend or to the level of the series. This will inevitably lead to strange predictions from traditional models.

A particularly interesting challenge with CPD algorithms is detecting these inflection points in real time. This means detecting a change point as soon as it arrives (or, at the very least, before the next change point occurs).

We can distinguish online and offline methods for CPD, where online refers to processing on the fly, dealing with each new data point as it becomes available. On the other hand, offline algorithms can work on the whole time-series at once. We'll deal more with online processing in Chapter 8, Online Learning for Time-Series.

CPD is related to segmentation, edge detection, event detection, and anomaly detection, and similar techniques can be applied to all these applications. CPD can be viewed as very much like anomaly detection, since one way to identify change points is by anomaly scores from an anomaly detection algorithm.

From this perspective, change points are identical to highly anomalous points, and anything above a certain threshold corresponds to a change. In the same way as anomaly detection, CPD can be defined as the problem of hypothesis testing between two alternatives, the null hypothesis being "no change occurs," and the alternative hypothesis of "a change occurs."

CPD algorithms are composed of three components: cost functions, search methods, and constraints. We'll go through these in turn. Cost functions are distance functions that can be applied to a subsection of the time-series (multivariate or univariate).

An example for a cost function is least absolute deviation (LAD), which is an estimator of a shift in the central point (mean, median, and mode) of a distribution defined as follows:

In this definition l is an index to a subsection in the time-series x, and is the central point of x.

The search function then iterates over the time-series to detect change points. This can be done approximately, such as in window-based detection, bottom-up methods, or binary segmentation, or it can be exhaustive as in the case of dynamic programming or Pruned Exact Linear Time (Pelt).

Pelt (Gachomo Dorcas Wambui and others, 2015) relies on pruning heuristics, and has a computational cost that is linear to the number of points of the time-series, . Dynamic programming methods have a much higher computational cost of , where n is the maximum number of expected change points.

Finally, the constraint can come into play as a penalty in the search algorithm. This penalty term can encode a cost budget or knowledge of the number of change points that we would expect to find.

It is notoriously difficult to evaluate the performance of CPD algorithms, because of the lack of benchmark datasets. Only very recently (2020) Gerrit van den Burg and Christopher Williams from the Alan Turing Institute and the University of Edinburgh published a benchmark consisting of 37 time-series from sources such as the World Bank, EuroStat, U.S. Census Bureau, GapMinder, and Wikipedia. Their benchmark is available on GitHub, and they mention change point annotations for datasets centered around the financial crisis of 2007-08, legislation on seat belts in the U.K., the Montreal Protocol regulating chlorofluorocarbon emissions, or the regulation of automated phone calls in the U.S.

In the same paper ("An Evaluation of Change Point Detection Algorithm"), the authors evaluated a whole range of methods for CPD. They note that their "zero" baseline method, which assumes no change points all, outperforms many of the other methods, according to F1-measure and a cluster overlap measure based on the Jaccard index. This is because of the small proportion of change points in the dataset, and the high number of false positives the methods return. They concluded that binary segmentation and Bayesian online CPD are among the best methods across the time-series.

Binary segmentation ("On Tests for Detecting Change in Mean" by Ashish Sen and Muni S. Srivastava, 1975) falls into the category of window-based CPD. Binary segmentation is a greedy algorithm that minimizes the sum of costs the most as defined like this:

is the found change point and c() is a cost function similar to LAD, which we saw earlier in this section. The general idea is that when two subsequences are highly dissimilar, this indicates a change point.

Binary segmentation is sequential in the sense that the change point is detected first on the complete time-series, then again on the two sub-sequences before and after the change point. This explains its low complexity of , where T is the length of the time-series. This computational cost makes it scalable to larger datasets.

This table presents an overview of methods for CPD:

Library

Implementations

Maintainer

Greykite

CPD via adaptive lasso

LinkedIn

ruptures

Offline CPD: binary segmentation, dynamic programming, Pelt, window-based

Charles Truong

Bayesian Changepoint Detection

Bayesian CPD

Johannes Kulick

banpei

Singular spectrum transformation

Hirofumi Tsuruta

changepy

Pelt algorithm

Rui Gil

onlineRPCA

Online Moving Window Robust Principal Component Analysis

Wei Xiao

Figure 6.7: CPD methods in Python

We've omitted Facebook's Prophet library since it's not a dedicated CPD package.

The chart below illustrates the popularity of CPD methods over time.

change_point_detection-star_history.png

Figure 6.8: Star history of CPD methods

LinkedIn's Greykite has been seeing a meteoric rise in GitHub stars since its release. Also ruptures has seen a huge increase in popularity.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image