Change detection

In statistical analysis, change detection or change point detection tries to identify times when the probability distribution of a stochastic process or time series changes. In general the problem concerns both detecting whether or not a change has occurred, or whether several changes might have occurred, and identifying the times of any such changes.

Yearly volume of the Nile river at Aswan, an example of time series data commonly used in change detection. Dotted line denotes a detected change point.[1]

Specific applications, like step detection and edge detection, may be concerned with changes in the mean, variance, correlation, or spectral density of the process. More generally change detection also includes the detection of anomalous behavior: anomaly detection.

Introduction

A time series measures the progression of one or more quantities over time. For instance, the figure above shows the level of water in the Nile river between 1870 and 1970. Change point detection is concerned with identifying whether, and if so when, the behavior of the series changes significantly. In the Nile river example, the volume of water changes significantly after a dam was built in the river. Importantly, anomalous observations that differ from the ongoing behavior of the time series are not generally considered change points as long as the series returns to its previous behavior afterwards.

Mathematically, we can describe a time series as an ordered sequence of observations . We can write the joint distribution of a subset of the time series as . If the goal is to determine whether a change point occurred at a time in a finite time series of length , then we really ask whether equals . This problem can be generalized to the case of more than one change point.

The problem of change point detection can be narrowed down further into more specific problems. In offline change point detection it is assumed that a sequence of length is available and the goal is to identify whether any change point(s) occurred in the series. This is an example of post hoc analysis and is often approached using hypothesis testing methods. By contrast, online change point detection is concerned with detecting change points in an incoming data stream.

Online change detection

Using the sequential analysis ("online") approach, any change test must make a trade-off between these common metrics:

In a Bayes change-detection problem, a prior distribution is available for the change time.

Online change detection is also done using streaming algorithms.

Minimax change detection

In minimax change detection, the objective is to minimize the expected detection delay for some worst-case change-time distribution, subject to a cost or constraint on false alarms.

A key technique for minimax change detection is the CUSUM procedure.

Offline change detection

Basseville (1993, Section 2.6) discusses offline change-in-mean detection with hypothesis testing based on the works of Page[2] and Picard[3] and maximum-likelihood estimation of the change time, related to two-phase regression. Other approaches employ clustering based on maximum likelihood estimation, or use optimization to infer the number and times of changes.[4]

"Offline" approaches cannot be used on streaming data because they need to compare to statistics of the complete time series, and cannot react to changes in real time but often provide more accurate estimation of the change time and magnitude.

Applications of change detection

Change detection tests are often used in manufacturing (quality control), intrusion detection, spam filtering, website tracking, and medical diagnostics.

Linguistic change detection

Linguistic change detection refers to the ability to detect word-level changes across multiple presentations of the same sentence. Researchers have found that the amount of semantic overlap (i.e., relatedness) between the changed word and the new word influences the ease with which such a detection is made (Sturt, Sanford, Stewart, & Dawydiak, 2004). Additional research has found that focussing one's attention to the word that will be changed during the initial reading of the original sentence can improve detection. This was shown using italicized text to focus attention, whereby the word that will be changing is italicized in the original sentence (Sanford, Sanford, Molle, & Emmott, 2006), as well as using clefting constructions such as "It was the tree that needed water." (Kennette, Wurm, & Van Havermaet, 2010). These change-detection phenomena appear to be robust, even occurring cross-linguistically when bilinguals read the original sentence in their native language and the changed sentence in their second language (Kennette, Wurm & Van Havermaet, 2010). Recently, researchers have detected word-level changes in semantics across time by computationally analyzing temporal corpora (for example:the word "gay" has acquired a new meaning over time) using change point detection.[5]

See also

References

  1. van den Burg, Gerrit J. J.; Williams, Christopher K. I. (May 26, 2020). "An Evaluation of Change Point Detection Algorithms". arXiv:2003.06222 [stat.ML].
  2. Page, E. S. (June 1957). "On problems in which a change in a parameter occurs at an unknown point". Biometrika. 44 (1/2): 248–252. doi:10.1093/biomet/44.1-2.248. JSTOR 2333258.
  3. Picard, Dominique (1985). "Testing and estimating change-points in time series". Advances in Applied Probability. 17 (4): 841–867. doi:10.2307/1427090. JSTOR 1427090.
  4. Yao, Yi-Ching (1988-02-01). "Estimating the number of change-points via Schwarz' criterion". Statistics & Probability Letters. 6 (3): 181–189. doi:10.1016/0167-7152(88)90118-6. ISSN 0167-7152.
  5. Kulkarni Vivek; Rfou Rami; Perozzi Bryan; Skiena Steven (2015). "Statistically Significant Detection of Linguistic Change". WWW '15 Proceedings of the 24th International Conference on World Wide Web: 625–635. arXiv:1411.3315. doi:10.1145/2736277.2741627. ISBN 9781450334693. S2CID 9298083.

Further reading

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.