berkeley stanford recovery oriented october 25 2001
play

Berkeley/Stanford Recovery-oriented October 25, 2001 Computing - PDF document

Berkeley/Stanford Recovery-oriented October 25, 2001 Computing Course Lecture Problem definition When bad things happen to good M Detection: determining that a problem has (will) systems: detecting and diagnosing occur(red) problems M


  1. Berkeley/Stanford Recovery-oriented October 25, 2001 Computing Course Lecture Problem definition When bad things happen to good M Detection: determining that a problem has (will) systems: detecting and diagnosing occur(red) problems M Diagnosis: determining the root cause of the problem M “Problem” can be broadly defined 3 2 – Performance-related, availability-related, security-related 1 0 M Fields to draw from: -1 -2 – System administration, operating systems, network management, 0 10 20 30 intrusion detection Kimberly Keeton M Techniques borrowed from: HPL Storage and Content Distribution – Statistics, database data mining, AI machine learning Berkeley/Stanford Recovery-oriented Computing Course Lecture October 18, 2001 Hewlett-Packard Laboratories Hewlett-Packard 2001-10-ROC-Lecture, 1 2001-10-ROC-Lecture Laboratories Storage & Content Distribution Outline Challenges in detecting problems M Problem definition M Many types of faults – Persistent increase, gradual change, abrupt change, single spike M Detection techniques M Time-varying property of observed system behavior – Challenges – Change point detection – Trends and seasonality (i.e., cyclic behavior) – Time series analysis M Distinguishing between the “good,” the “bad” and the – Predictive detection “ugly” – Data mining/machine learning algorithms M Detecting problems fast enough to minimize service M Diagnosis techniques disruption M Additional related work M Catching false positives vs. neglecting true positives M Summary Hewlett-Packard Hewlett-Packard 2001-10-ROC-Lecture, 2 2001-10-ROC-Lecture, 3 Laboratories Laboratories Storage & Content Distribution Storage & Content Distribution Change point detection algorithms [Hellerstein98] Maximum likelihood ratio M Basic idea: M Let Y 1 , Y 2 , … Y T be i.i.d. random variables – Determine when process parameters have changed M Let f(Y i , θ θ ) be the probability distribution function (pdf) of – Declare change point if I/O response time is “more likely” to have the random variables, where θ θ is the only parameter in the come from a distribution with a different mean pdf 6 M Let f( θ θ o ) and f( θ θ 1 ) be different distributions 5 4 M Likelihood ratio: T 3 ∏ θ f ( Y ) 2 i , 1 = 1 i 1 0 T ∏ θ f ( Y ) -1 i , 0 -2 i = 1 -3 M Large ratio => more likely Y 1 , Y 2 , … Y T from f( θ θ 1 ) M Ex: maximum likelihood ratio detection rules, such as cumulative sum (CUMSUM) Hewlett-Packard Hewlett-Packard 2001-10-ROC-Lecture, 4 2001-10-ROC-Lecture, 5 Laboratories Laboratories Storage & Content Distribution Storage & Content Distribution 1

  2. Berkeley/Stanford Recovery-oriented October 25, 2001 Computing Course Lecture Maximum likelihood ratio detection rule CUMSUM example M Declare a change has occurred at N if the likelihood ratio • Raw data: difficult to after the change exceeds a pre-determined threshold level c detect change     n   θ ∏ f ( Y ) i , 1   • CUMSUM: easier to = i k = ≥ ≥ N inf  n 1 : sup c  detect change n   1 ≤ k ≤ n ∏ f ( Y θ ) , 0  i  i = k     • CUMSUM confidence M Ex: CUMSUM rule for normal random variables level    n  M Confidence level compared with bootstrapping (random permutation ∑ N = inf n ≥ 1 : max ( Y − Y ) ≥ c   i of data) 1 ≤ k ≤ n   i = k   – Bootstrap: flat cumulative residuals – CUMSUM: angle forms at change point Hewlett-Packard Hewlett-Packard 2001-10-ROC-Lecture, 6 2001-10-ROC-Lecture, 7 Laboratories Laboratories Storage & Content Distribution Storage & Content Distribution Change point pros/cons Outline M Advantages: M Problem definition – Well-established statistical technique M Detection techniques – Several variants of on-line and off-line algorithms – Challenges – Change point detection – Time series analysis M Disadvantages: – Predictive detection – Focuses on single type of fault – abrupt changes – Data mining/machine learning algorithms – Mostly limited to stationary (non-varying over time) processes M Diagnosis techniques • Must separately deal with long-term trends and seasonality – Some dependence on knowledge of and assumptions of data M Additional related work distributions M Summary Hewlett-Packard Hewlett-Packard 2001-10-ROC-Lecture, 8 2001-10-ROC-Lecture, 9 Laboratories Laboratories Storage & Content Distribution Storage & Content Distribution Time series forecasting algorithms Holt-Winters measure of deviation M Basic idea: M Confidence bands to measure deviation in seasonal cycle: – Build model of what you expect next observation to be, and raise alarm if observed and predicted values differ too much – predicted deviation: d t = γ γ |y t – y’ t | + (1 – γ γ )(d t-m ) M Ex: Holt-Winters forecasting [Hoogenboom93, Brutlag00] – confidence band: (y’ t – δ δ · d t-m , y’ t + δ δ · d t-m ) – 3-part model built on exponential smoothing: M Trigger alarm when number of violations exceeds – prediction = baseline + linear trend + seasonal effect threshold • y’ t+1 = a t + b t + c t+1-m – To reduce false alarm rate, measure across moving, fixed- • baseline: a t = α α (y t – c t-m ) + (1 – α α )(a t-1 + b t-1 ) sized window • linear trend: b t = β β (a t – a t-1 ) + (1 – β β )(b t-1 ) • seasonal trend: c t = γ γ (y t – a t ) + (1 – γ γ )(c t-m ) • where m is period of seasonal cycle Hewlett-Packard Hewlett-Packard 2001-10-ROC-Lecture, 10 2001-10-ROC-Lecture, 11 Laboratories Laboratories Storage & Content Distribution Storage & Content Distribution 2

  3. Berkeley/Stanford Recovery-oriented October 25, 2001 Computing Course Lecture Time series forecasting pros/cons Holt-Winters example 1 LU read experiment - faultlu only M Advantages: 0.035 Response time (seconds) – Well-established statistical technique 0.03 – Considers time-varying properties of data 0.025 • Trends and seasonality (at many levels) 0.02 observations 0.015 lowerBound upperBound 0.01 M Disadvantages: 0.005 0 – Large number of parameters to tune for algorithm to work 0 20 40 60 80 -0.005 correctly Time (minutes) – Detection of problem after it occurs may imply service disruption M Simplified Holt-Winters: exponential smoothing M Generally detects 10-minute changes – Violations occur when observation falls outside of lower and upper bounds Hewlett-Packard Hewlett-Packard 2001-10-ROC-Lecture, 12 2001-10-ROC-Lecture, 13 Laboratories Laboratories Storage & Content Distribution Storage & Content Distribution Outline Predictive detection [Hellerstein00] M Problem definition M Basic idea: – Predict probability of violations of threshold tests in advance, M Detection techniques including how long until violation – Challenges – Change point detection – Allows pre-emptive corrective action in advance of service – Time series analysis disruption – Predictive detection – Data mining/machine learning algorithms – Also allows service providers to give customers advanced notice of M Diagnosis techniques potential service degradations M Additional related work M Summary Hewlett-Packard Hewlett-Packard 2001-10-ROC-Lecture, 14 2001-10-ROC-Lecture, 15 Laboratories Laboratories Storage & Content Distribution Storage & Content Distribution Predictive detection highlights Predictive detection example M Model both stationary and nonstationary effects 10 10 9 – Stationary: multi-part model using ANOVA techniques 9 Metric of interest Transformed metric of 8 8 7 – Non-stationary: use auto-correlation and auto-regression to 7 interest 6 6 Data capture short-range dependencies 5 5 Threshold 4 4 3 M Use observed data and models to predict future 3 2 2 1 transformed values for a prediction horizon 1 0 0 t-2 t-1 t t+1 t+2 t+3 M Calculate the probability that threshold is violated at each t-2 t-1 t t+1 t+2 t+3 Time Time point in the prediction horizon M Transform data and thresholds M May consider both upper and lower thresholds – Measured (time-varying) values are transformed into (stationary) values – Constant raw threshold also transformed into (time-varying) thresholds M Predict future values and probability of threshold violation Hewlett-Packard Hewlett-Packard 2001-10-ROC-Lecture, 16 2001-10-ROC-Lecture, 17 Laboratories Laboratories Storage & Content Distribution Storage & Content Distribution 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend