Changepoint detection in network measurements Allen B. Downey 1 - - PowerPoint PPT Presentation

changepoint detection in network measurements allen b
SMART_READER_LITE
LIVE PREVIEW

Changepoint detection in network measurements Allen B. Downey 1 - - PowerPoint PPT Presentation

Changepoint detection in network measurements Allen B. Downey 1 Fundamental problem: Predict next value in time series. Applications: Protocol parameters (timeouts). Resource selection, scheduling. User feedback. 2 Two kinds of


slide-1
SLIDE 1

Changepoint detection in network measurements Allen B. Downey

1

slide-2
SLIDE 2

Fundamental problem:

Predict next value in time series.

Applications:

Protocol parameters (timeouts). Resource selection, scheduling. User feedback.

2

slide-3
SLIDE 3

Two kinds of prediction:

Single value prediction. Predictive distribution.

  • Summary stats.
  • Intervals.
  • P(error > thresh)
  • E[cost(error)]

3

slide-4
SLIDE 4

If we assume stationarity, life is good.

Accumulate data indefinitely. Predictive distribution = observed distribution.

4

slide-5
SLIDE 5

Non-stationary models:

Trends + noise. Level + changepoint + noise.

5

slide-6
SLIDE 6

Network performance:

Some trends (accumulating queue). Many abrupt changepoints.

  • Beginning and end of transfers.
  • Routing changes.
  • Hardware failure, replacement.

6

slide-7
SLIDE 7

Prediction with known changepoints:

Use data back to the latest changepoint. Less accurate immediately after.

7

slide-8
SLIDE 8

Prediction with probablistic changepoints. P(i) = prob of a changepoint after point i Example:

150 data points. P(50) = 0.7 P(100) = 0.5

How do we generate a predictive distribution?

8

slide-9
SLIDE 9

Two steps:

Derive P(i+) = prob that i is the last changepoint Compute weighted mix going back to each i.

Example:

P(50) = 0.7

P(100) = 0.5

P(50+) = 0.35

P(100+) = 0.5

Plus 0.15 chance of no changepoint.

9

slide-10
SLIDE 10

Predictive distribution = 0.50 · ed f(100, 150) ⊕ 0.35 · ed f(50, 150) ⊕ 0.15 · ed f(0, 150)

10

slide-11
SLIDE 11

So how do we generate the probabilities P(i+)? Three steps:

Bayes’ theorem. Simple case: we know there is 1 changepoint. General case: unknown # of changepoints.

11

slide-12
SLIDE 12

Bayes’ theorem (diachronic interpretation) P(H|E) = P(E|H) P(E) P(H)

H is a hypothesis, E is a body of evidence. P(H|E): posterior P(H): prior P(E|H) is usually easy to compute. P(E) is often not.

12

slide-13
SLIDE 13

Unless we have a suite of exclusive hypotheses. P(E) =

  • Hi∈S

P(E|Hi)P(Hi) In that case life is good.

13

slide-14
SLIDE 14

If we know there there is exactly one changepoint in

an interval...

...then the P(i) are exclusive hypotheses, and all we need is P(E|i).

Which is pretty much a solved problem.

14

slide-15
SLIDE 15

What if the # of changepoints is unknown?

P(i) are no longer exclusive. But the P(i+) are. And we can write a system of equations for P(i+).

15

slide-16
SLIDE 16

P(i+) = P(i+|⊘) P(⊘) +

  • j<i

P(i+|j++) P(j++)

P(j++) is the prob that the second-to last

changepoint is at i.

P(i+|j++) reduces to the simple problem. P(⊘) is the prob that we have not seen two

changepoints.

P(i+|⊘) reduces to the simple problem (plus).

Great, so what’s P(j++)?

16

slide-17
SLIDE 17

P(i++) =

  • k>i

P(i++|k+) P(k+)

P(i++|k+) is just P(i+) computed at time k. So we can solve for P(i+) in terms of P(i++). And P(i++) in terms of P(i+). Calling Dr. Jacobi!

17

slide-18
SLIDE 18

Implementation:

Need to keep n2/2 previous values. And n2/2 summary statistics. And it takes n2 work to do an update. But, we only have to go back two changepoints, ...so we can keep n small.

18

slide-19
SLIDE 19
  • 4
  • 2

2 4 x[i]

data

50 100 150 time 0.0 0.5 1.0 cumulative probability

P(i+) P(i++)

Synthetic series

with two changepoints.

µ = −0.5, 0.5, 0.0 σ = 1.0 P(⊘) = 0.04

19

slide-20
SLIDE 20

1880 1900 1920 1940 1960 50 100 150 annual flow (10^9 m^3)

data

1880 1900 1920 1940 1960 time 0.0 0.5 1.0 cumulative probability

P33(i+) P66(i+) P99(i+)

The ubiquitous

Nile dataset.

Change in 1898. Estimated probs

can be mercurial.

20

slide-21
SLIDE 21
  • 4
  • 2

2 4

data

50 100 index 0.0 0.5 1.0 cumulative probability

P(i+) P(i++)

Can also detect

change in variance.

µ = 1, 0, 0 σ = 1, 1, 0.5 Estimated P(i+)

is good.

Estimated

P(i++) less certain.

21

slide-22
SLIDE 22

Qualitative behavior seems good. Quantitative tests:

  • Compare to GLR for online alarm problem.
  • Test predictive distribution with synthetic data,
  • ... and with real data.

22

slide-23
SLIDE 23

Online alarm problem:

Observe process in real time. µ0 and σ known. τ and µ1 unknown. Raise alarm when f(data) > thresh. Minimize delay. Minimize false alarm rate.

23

slide-24
SLIDE 24

GLR = generalized likelihood ratio.

Compute decision function gk. E[gk] = 0 before the changepoint, ... increases after. Alarm when gk > h. GLR is optimal when µ1 is known.

24

slide-25
SLIDE 25

CPP = change point probability P(changepoint) =

k

  • i=0

P(i+)

Alarm when P(changepoint) > thresh.

25

slide-26
SLIDE 26

0.0 0.1 0.2 false alarm probability 5 10 15 mean delay

GLR CPP

µ = 0, 1 σ = 1 τ ∼ Exp(0.01) Goodness =

lower mean delay for same false alarm rate.

26

slide-27
SLIDE 27

0.0 0.5 1.0 1.5 sigma 5 10 15 20 25 mean delay

GLR (5% false alarm rate) CPP (5% false alarm rate)

Fix false alarm

rate = 5%

Vary σ. CPP does well

with small S/N.

27

slide-28
SLIDE 28

So it works on a simple problem. Future work:

Other changepoint problems (location, tracking,

prediction).

Other data distributions (lognormal). Testing robustness (real data).

28

slide-29
SLIDE 29

Good news:

Very general framework. Seems to work. Many possible applications.

29

slide-30
SLIDE 30

Bad news:

Still some wrinkles to iron. n2 space and time may be fatal. May be overkill for original application.

30

slide-31
SLIDE 31

More at

http://allendowney.com/changepoint

Or email downey@allendowney.com

31