detecting outliers under
play

Detecting Outliers under Detecting Outliers . . . What We Plan To - PowerPoint PPT Presentation

Outlier Detection Is . . . Outlier Detection . . . Which Approach Is . . . Detecting Outliers under Detecting Outliers . . . What We Plan To Do Interval Uncertainty: Algorithm Number of . . . A New Algorithm Based on Justification of the .


  1. Outlier Detection Is . . . Outlier Detection . . . Which Approach Is . . . Detecting Outliers under Detecting Outliers . . . What We Plan To Do Interval Uncertainty: Algorithm Number of . . . A New Algorithm Based on Justification of the . . . Acknowledgments Constraint Satisfaction Title Page Evgeny Dantsin and Alexander Wolpert ◭◭ ◮◮ Department of Computer Science, Roosevelt University ◭ ◮ Chicago, IL 60605, USA, { edantsin,awolpert } @roosevelt.edu Page 1 of 10 Martine Ceberio, Gang Xiang, and Vladik Kreinovich Department of Computer Science, University of Texas at El Paso Go Back El Paso, TX 79968, USA, { mceberio,vladik } @cs.utep.edu Full Screen Close Quit

  2. 1. Outlier Detection Is Important Outlier Detection Is . . . Outlier Detection . . . • In many application areas, it is important to detect outliers , i.e., Which Approach Is . . . unusual, abnormal values. Detecting Outliers . . . • In medicine: outliers may mean disease. What We Plan To Do Algorithm • In geophysics: outlier may mean a mineral deposit. Number of . . . • In structural integrity testing: outlier may mean a structural fault. Justification of the . . . Acknowledgments • Traditional engineering approach to outlier detection: – collect measurement results x 1 , . . . , x n corresponding to nor- Title Page mal situations; ◭◭ ◮◮ n � √ = 1 def def = M − E 2 – compute E n · x i and σ = V , where V ◭ ◮ i =1 � n = 1 Page 2 of 10 def x 2 and M n · i ; i =1 Go Back – a value x is classified as an outlier if it is outside the interval def def Full Screen [ L, U ], where L = E − k 0 · σ , U = E + k 0 · σ , and k 0 > 1 is pre-selected (most frequently, k 0 = 2, 3, or 6). Close Quit

  3. 2. Outlier Detection Under Interval Uncertainty Outlier Detection Is . . . Outlier Detection . . . • In practice: often, we only have intervals x i = [ x i , x i ] of possible Which Approach Is . . . values of x i . Detecting Outliers . . . • Example: the value � x i measured by an instrument with a known What We Plan To Do upper bound ∆ i on the measurement error means that Algorithm Number of . . . x i ∈ [ � x i − ∆ i , � x i + ∆ i ] . Justification of the . . . Acknowledgments • Problem: for different values x i ∈ x i , we get different L and U . • Objective: given x i and k 0 , compute Title Page def ◭◭ ◮◮ L = [ L, L ] = { L ( x 1 , . . . , x n ) : x 1 ∈ x 1 , . . . , x n ∈ x n } ; ◭ ◮ def U = [ U, U ] = { U ( x 1 , . . . , x n ) : x 1 ∈ x 1 , . . . , x n ∈ x n } . Page 3 of 10 • A value x is a possible outlier if it is outside one of the possible k 0 -sigma intervals [ L, U ], i.e., if x �∈ [ L, U ]. Go Back • A value x is a guaranteed outlier if it is outside all possible k 0 - Full Screen sigma intervals [ L, U ], i.e., if , i.e., if x �∈ [ L, U ]. Close Quit

  4. 3. Which Approach Is More Reasonable? Outlier Detection Is . . . Outlier Detection . . . • Situation: our main objective is not to miss an outlier. Which Approach Is . . . Detecting Outliers . . . – Example: structural integrity tests. What We Plan To Do – Clarification: we do not want to risk launching a spaceship Algorithm with a faulty part. Number of . . . – Reasonable approach: look for possible outliers. Justification of the . . . • Situation: make sure that the value x is an outlier. Acknowledgments – Example: planning a surgery. Title Page – Clarification: we want to make sure that there is a micro- ◭◭ ◮◮ calcification before we start cutting the patient. – Reasonable approach: look for guaranteed outliers. ◭ ◮ Page 4 of 10 Go Back Full Screen Close Quit

  5. 4. Detecting Outliers Under Interval Uncertainty: What Outlier Detection Is . . . Is Known Outlier Detection . . . Which Approach Is . . . • Case of possible outliers: there exist efficient algorithms for com- Detecting Outliers . . . puting L and U . What We Plan To Do Algorithm • Case of guaranteed outliers: the computation of L and U is, in Number of . . . general, NP-hard. Justification of the . . . • Technical result: if 1 + (1 /k 0 ) 2 < n (e.g., if k 0 > 1 and n ≥ 2), Acknowledgments then the maximum U of U (and the minimum L of L ) is always attained at a combination of endpoints of x i . Title Page • Resulting algorithm: compute U and L by trying all 2 n combina- ◭◭ ◮◮ tions of x i and x i . ◭ ◮ def • Specific case: when all measured values � x i = ( x i + x i ) / 2 are defi- nitely different from each other, in the sense that the “narrowed” Page 5 of 10 intervals do not intersect � � Go Back x i − 1 + α 2 x i + 1 + α 2 � · ∆ i , � · ∆ i , n n Full Screen def where α = 1 /k 0 and ∆ i = ( x i − x i ) / 2 is the interval’s half-width. Close • Good news: in this case, we can compute U and L in feasible time. Quit

  6. 5. What We Plan To Do Outlier Detection Is . . . Outlier Detection . . . • More general case: no two narrowed intervals are proper subsets Which Approach Is . . . of one another. Detecting Outliers . . . • In precise terms: one of them is not a subset of the interior of the What We Plan To Do other. Algorithm Number of . . . • Objective: extend known efficient algorithms to this case. Justification of the . . . • Since L ( x i ) = − U ( − x i ), it suffices to be able to compute U . Acknowledgments • Main idea: reduce the interval computation problem to the con- Title Page straint satisfaction problem with the following constraints: ◭◭ ◮◮ – for every i , if in the maximizing assignment we have x i = x i , then replacing this value with x i = x i will either decrease U ◭ ◮ or leave U unchanged; Page 6 of 10 – for every i , if in the maximizing assignment we have x i = x i , then replacing this value with x i = x i will either decrease U Go Back or leave U unchanged; Full Screen – for every i and j , replacing both x i and x j with the oppo- site ends of the corresponding intervals x i and x j will either Close decrease U or leave U unchanged. Quit

  7. 6. Algorithm Outlier Detection Is . . . Outlier Detection . . . • General idea: Which Approach Is . . . – First, we sort of the values � x i into an increasing sequence. Detecting Outliers . . . What We Plan To Do – Without losing generality, we can assume that Algorithm � x 1 ≤ � x 2 ≤ . . . ≤ � x n . Number of . . . – Then, for every k from 0 to n , we compute the value V ( k ) = Justification of the . . . M ( k ) − ( E ( k ) ) 2 of the population variance V for the vec- Acknowledgments tor x ( k ) = ( x 1 , . . . , x k , x k +1 , . . . , x n ), and we compute U ( k ) = √ E ( k ) + k 0 · V ( k ) . Title Page – Finally, we compute U as the largest of n +1 values U (0) , . . . , U ( n ) . ◭◭ ◮◮ • Details: how to compute the values V ( k ) ◭ ◮ – First, we explicitly compute M (0) , E (0) , and Page 7 of 10 V (0) = M (0) − ( E (0) ) 2 . – Once we know the values M ( k ) and E ( k ) , we can compute Go Back M ( k +1) = M ( k ) + 1 n · ( x k +1 ) 2 − 1 Full Screen n · ( x k +1 ) 2 Close and E ( k +1) = E ( k ) + 1 n · x k +1 − 1 n · x k +1 . Quit

  8. 7. Number of Computation Steps Outlier Detection Is . . . Outlier Detection . . . • Sorting: requires O ( n · log( n )) steps. Which Approach Is . . . • Computing the initial values M (0) , E (0) , and V (0) requires linear Detecting Outliers . . . time O ( n ). What We Plan To Do Algorithm • For each k from 0 to n − 1, we need a constant number of steps Number of . . . to compute the next values M ( k +1) , E ( k +1) , and V ( k +1) as Justification of the . . . M ( k +1) = M ( k ) + 1 n · ( x k +1 ) 2 − 1 Acknowledgments n · ( x k +1 ) 2 and E ( k +1) = E ( k ) + 1 n · x k +1 − 1 Title Page n · x k +1 . ◭◭ ◮◮ √ • Computing U ( k ) = E ( k ) + k 0 · V ( k ) also requires a constant number ◭ ◮ of steps. • Finally, finding the largest of n +1 values U ( k ) requires O ( n ) steps. Page 8 of 10 Go Back • Overall: we need Full Screen O ( n · log( n )) + O ( n ) + O ( n ) + O ( n ) = O ( n · log( n )) steps . Close • Comment: if the measurement results � x i are already sorted, then we only need linear time to compute U . Quit

  9. 8. Justification of the Algorithm Outlier Detection Is . . . Outlier Detection . . . • Known: U = max U is attained at a vector x = ( x 1 , . . . , x n ) in Which Approach Is . . . which each value x i is equal either to x i or to x i . Detecting Outliers . . . • New result: this maximum is attained at one of the vectors x ( k ) What We Plan To Do in which all the lower bounds x i precede all the upper bounds x i . Algorithm Number of . . . • How we prove it: by reduction to a contradiction. Justification of the . . . • Assume: the maximum is attained at a vector x in which one of Acknowledgments the lower bounds follows one of the upper bounds. • Notation: let i be the largest upper bound index followed by the Title Page lower bound. ◭◭ ◮◮ • Conclusion: in x opt , we have x i = x i and x i +1 = x i +1 . ◭ ◮ • Following proof: since maximum is attained at x , each replacing: Page 9 of 10 – replacing x i with x i ; – replacing x i +1 with x i +1 ; and Go Back – replacing both Full Screen leads to ∆ U ≤ 0; we trace these changes ∆ U . Close • We then conclude that one of the narrowed intervals is a proper subset of another – contradiction to our assumption. Quit

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend