changepoint detection in network measurements allen b
play

Changepoint detection in network measurements Allen B. Downey 1 - PowerPoint PPT Presentation

Changepoint detection in network measurements Allen B. Downey 1 Fundamental problem: Predict next value in time series. Applications: Protocol parameters (timeouts). Resource selection, scheduling. User feedback. 2 Two kinds of


  1. Changepoint detection in network measurements Allen B. Downey 1

  2. Fundamental problem: � Predict next value in time series. Applications: � Protocol parameters (timeouts). � Resource selection, scheduling. � User feedback. 2

  3. Two kinds of prediction: � Single value prediction. � Predictive distribution. • Summary stats. • Intervals. • P ( error > thresh ) • E [ cost ( error )] 3

  4. If we assume stationarity, life is good. � Accumulate data indefinitely. � Predictive distribution = observed distribution. 4

  5. Non-stationary models: � Trends + noise. � Level + changepoint + noise. 5

  6. Network performance: � Some trends (accumulating queue). � Many abrupt changepoints. • Beginning and end of transfers. • Routing changes. • Hardware failure, replacement. 6

  7. Prediction with known changepoints: � Use data back to the latest changepoint. � Less accurate immediately after. 7

  8. Prediction with probablistic changepoints. P ( i ) = prob of a changepoint after point i Example: � 150 data points. � P (50) = 0 . 7 � P (100) = 0 . 5 How do we generate a predictive distribution? 8

  9. Two steps: � Derive P ( i +) = prob that i is the last changepoint � Compute weighted mix going back to each i . Example: � P (50) = 0 . 7 P (100) = 0 . 5 � P (50+) = 0 . 35 P (100+) = 0 . 5 � Plus 0.15 chance of no changepoint. 9

  10. Predictive distribution = 0 . 50 · ed f (100 , 150) ⊕ 0 . 35 · ed f (50 , 150) ⊕ 0 . 15 · ed f (0 , 150) 10

  11. So how do we generate the probabilities P ( i +) ? Three steps: � Bayes’ theorem. � Simple case: we know there is 1 changepoint. � General case: unknown # of changepoints. 11

  12. Bayes’ theorem (diachronic interpretation) P ( H | E ) = P ( E | H ) P ( E ) P ( H ) � H is a hypothesis, E is a body of evidence. � P ( H | E ) : posterior � P ( H ) : prior � P ( E | H ) is usually easy to compute. � P ( E ) is often not. 12

  13. Unless we have a suite of exclusive hypotheses. � P ( E ) = P ( E | H i ) P ( H i ) H i ∈ S In that case life is good. 13

  14. � If we know there there is exactly one changepoint in an interval... � ...then the P ( i ) are exclusive hypotheses, � and all we need is P ( E | i ) . Which is pretty much a solved problem. 14

  15. What if the # of changepoints is unknown? � P ( i ) are no longer exclusive. � But the P ( i +) are. � And we can write a system of equations for P ( i +) . 15

  16. � P ( i + ) = P ( i + |⊘ ) P ( ⊘ ) + P ( i + | j ++ ) P ( j ++ ) j<i � P ( j ++ ) is the prob that the second-to last changepoint is at i . � P ( i + | j ++ ) reduces to the simple problem. � P ( ⊘ ) is the prob that we have not seen two changepoints. � P ( i + |⊘ ) reduces to the simple problem (plus). Great, so what’s P ( j ++ ) ? 16

  17. � P ( i ++ ) = P ( i ++ | k + ) P ( k + ) k>i � P ( i ++ | k + ) is just P ( i + ) computed at time k . � So we can solve for P ( i + ) in terms of P ( i ++ ) . � And P ( i ++ ) in terms of P ( i + ) . � Calling Dr. Jacobi! 17

  18. Implementation: � Need to keep n 2 / 2 previous values. � And n 2 / 2 summary statistics. � And it takes n 2 work to do an update. � But, we only have to go back two changepoints, � ...so we can keep n small. 18

  19. 4 � Synthetic series data with two 2 changepoints. x[i] 0 � µ = − 0 . 5 , 0 . 5 , 0 . 0 -2 � σ = 1 . 0 -4 � P ( ⊘ ) = 0 . 04 1.0 P(i+) cumulative probability P(i++) 0.5 0.0 0 50 100 150 time 19

  20. 150 � The ubiquitous data annual flow (10^9 m^3) Nile dataset. 100 � Change in 1898. 50 � Estimated probs can be 0 1880 1900 1920 1940 1960 mercurial. 1.0 P33(i+) cumulative probability P66(i+) P99(i+) 0.5 0.0 1880 1900 1920 1940 1960 time 20

  21. 4 � Can also detect data change in 2 variance. 0 � µ = 1 , 0 , 0 -2 � σ = 1 , 1 , 0 . 5 -4 � Estimated P ( i + ) 1.0 cumulative probability P(i+) is good. P(i++) � Estimated 0.5 P ( i ++ ) less certain. 0.0 0 50 100 index 21

  22. � Qualitative behavior seems good. � Quantitative tests: • Compare to GLR for online alarm problem. • Test predictive distribution with synthetic data, • ... and with real data. 22

  23. Online alarm problem: � Observe process in real time. � µ 0 and σ known. � τ and µ 1 unknown. � Raise alarm when f ( data ) > thresh . � Minimize delay. � Minimize false alarm rate. 23

  24. GLR = generalized likelihood ratio. � Compute decision function g k . � E [ g k ] = 0 before the changepoint, � ... increases after. � Alarm when g k > h . � GLR is optimal when µ 1 is known. 24

  25. CPP = change point probability k � P ( i + ) P ( changepoint ) = i =0 � Alarm when P ( changepoint ) > thresh . 25

  26. � µ = 0 , 1 15 � σ = 1 GLR � τ ∼ Exp (0 . 01) CPP mean delay 10 � Goodness = lower mean delay for same false alarm rate. 5 0 0.0 0.1 0.2 false alarm probability 26

  27. � Fix false alarm 25 rate = 5% GLR (5% false alarm rate) � Vary σ . 20 CPP (5% false alarm rate) mean delay � CPP does well 15 with small S/N . 10 5 0 0.0 0.5 1.0 1.5 sigma 27

  28. So it works on a simple problem. Future work: � Other changepoint problems (location, tracking, prediction). � Other data distributions (lognormal). � Testing robustness (real data). 28

  29. Good news: � Very general framework. � Seems to work. � Many possible applications. 29

  30. Bad news: � Still some wrinkles to iron. � n 2 space and time may be fatal. � May be overkill for original application. 30

  31. � More at http://allendowney.com/changepoint � Or email downey@allendowney.com 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend