Change point detection
A common problem with time-series is changes in the behavior of the observed system. Generally speaking, a change point signals an abrupt and significant transition between states in the process generating the series. For example, the trend can suddenly change, and a change point can signal where the trend of the series changes. This is well known under the guise of technical chart pattern analysis in trading.
This list captures some applications for Change point detection (CPD):
- Speech recognition: Detection of word and sentence boundaries
- Image analysis: Surveillance on video footage of closed-circuit television
- Fitness: Segmenting human activities over time based on data from motion sensors from smart devices such as watches or phones
- Finance: Identifying changes to trend patterns that could indicate changes from bear to bull markets, or the other way around
As an example for the importance of CPD, consider the stock market. Time-series data that describes the evolution of a market, such as stock prices, follows trends – it either rises, falls, or doesn't change significantly (stagnation).
When a stock rises, the investor wants to buy the stock. Otherwise, when the stock is falling, the investors doesn't want to keep the stock, but rather to sell it. Not changing the position will cause a loss of book value – in the best case, this will cause a problem with liquidity.
For investors, it is therefore key to know, when the market changes from rising to falling or the other way around. Recognizing these changes can make the difference between winning or losing.
In forecasting, special events like Black Friday, Christmas, an election, a press release, or changes in regulation can cause short-term (perhaps then classed as an anomaly) or long-term change to the trend or to the level of the series. This will inevitably lead to strange predictions from traditional models.
A particularly interesting challenge with CPD algorithms is detecting these inflection points in real time. This means detecting a change point as soon as it arrives (or, at the very least, before the next change point occurs).
We can distinguish online and offline methods for CPD, where online refers to processing on the fly, dealing with each new data point as it becomes available. On the other hand, offline algorithms can work on the whole time-series at once. We'll deal more with online processing in Chapter 8, Online Learning for Time-Series.
CPD is related to segmentation, edge detection, event detection, and anomaly detection, and similar techniques can be applied to all these applications. CPD can be viewed as very much like anomaly detection, since one way to identify change points is by anomaly scores from an anomaly detection algorithm.
From this perspective, change points are identical to highly anomalous points, and anything above a certain threshold corresponds to a change. In the same way as anomaly detection, CPD can be defined as the problem of hypothesis testing between two alternatives, the null hypothesis being "no change occurs," and the alternative hypothesis of "a change occurs."
CPD algorithms are composed of three components: cost functions, search methods, and constraints. We'll go through these in turn. Cost functions are distance functions that can be applied to a subsection of the time-series (multivariate or univariate).
An example for a cost function is least absolute deviation (LAD), which is an estimator of a shift in the central point (mean, median, and mode) of a distribution defined as follows:
In this definition l is an index to a subsection in the time-series x, and is the central point of x.
The search function then iterates over the time-series to detect change points. This can be done approximately, such as in window-based detection, bottom-up methods, or binary segmentation, or it can be exhaustive as in the case of dynamic programming or Pruned Exact Linear Time (Pelt).
Pelt (Gachomo Dorcas Wambui and others, 2015) relies on pruning heuristics, and has a computational cost that is linear to the number of points of the time-series, . Dynamic programming methods have a much higher computational cost of , where n is the maximum number of expected change points.
Finally, the constraint can come into play as a penalty in the search algorithm. This penalty term can encode a cost budget or knowledge of the number of change points that we would expect to find.
It is notoriously difficult to evaluate the performance of CPD algorithms, because of the lack of benchmark datasets. Only very recently (2020) Gerrit van den Burg and Christopher Williams from the Alan Turing Institute and the University of Edinburgh published a benchmark consisting of 37 time-series from sources such as the World Bank, EuroStat, U.S. Census Bureau, GapMinder, and Wikipedia. Their benchmark is available on GitHub, and they mention change point annotations for datasets centered around the financial crisis of 2007-08, legislation on seat belts in the U.K., the Montreal Protocol regulating chlorofluorocarbon emissions, or the regulation of automated phone calls in the U.S.
In the same paper ("An Evaluation of Change Point Detection Algorithm"), the authors evaluated a whole range of methods for CPD. They note that their "zero" baseline method, which assumes no change points all, outperforms many of the other methods, according to F1-measure and a cluster overlap measure based on the Jaccard index. This is because of the small proportion of change points in the dataset, and the high number of false positives the methods return. They concluded that binary segmentation and Bayesian online CPD are among the best methods across the time-series.
Binary segmentation ("On Tests for Detecting Change in Mean" by Ashish Sen and Muni S. Srivastava, 1975) falls into the category of window-based CPD. Binary segmentation is a greedy algorithm that minimizes the sum of costs the most as defined like this:
is the found change point and c() is a cost function similar to LAD, which we saw earlier in this section. The general idea is that when two subsequences are highly dissimilar, this indicates a change point.
Binary segmentation is sequential in the sense that the change point is detected first on the complete time-series, then again on the two sub-sequences before and after the change point. This explains its low complexity of , where T is the length of the time-series. This computational cost makes it scalable to larger datasets.
This table presents an overview of methods for CPD:
Library |
Implementations |
Maintainer |
Greykite |
CPD via adaptive lasso |
|
ruptures |
Offline CPD: binary segmentation, dynamic programming, Pelt, window-based |
Charles Truong |
Bayesian Changepoint Detection |
Bayesian CPD |
Johannes Kulick |
banpei |
Singular spectrum transformation |
Hirofumi Tsuruta |
changepy |
Pelt algorithm |
Rui Gil |
onlineRPCA |
Online Moving Window Robust Principal Component Analysis |
Wei Xiao |
Figure 6.7: CPD methods in Python
We've omitted Facebook's Prophet library since it's not a dedicated CPD package.
The chart below illustrates the popularity of CPD methods over time.
Figure 6.8: Star history of CPD methods
LinkedIn's Greykite has been seeing a meteoric rise in GitHub stars since its release. Also ruptures has seen a huge increase in popularity.