nonparametric sequential change detection for high
play

Nonparametric Sequential Change Detection for High-Dimensional - PowerPoint PPT Presentation

Nonparametric Sequential Change Detection for High-Dimensional Problems Nonparametric Sequential Change Detection for High-Dimensional Problems Yasin Ylmaz Electrical Engineering, University of South Florida Allerton 2017 Nonparametric


  1. Nonparametric Sequential Change Detection for High-Dimensional Problems Nonparametric Sequential Change Detection for High-Dimensional Problems Yasin Yılmaz Electrical Engineering, University of South Florida Allerton 2017

  2. Nonparametric Sequential Change Detection for High-Dimensional Problems Outline 1 Introduction 2 Background 3 ODIT: Online Discrepancy Test 4 Numerical Results 5 Conclusion

  3. Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction Introduction

  4. f 0 ( x ) f 1 ( x ) f ( x ) x Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction Anomaly Detection Objective: identify patterns that deviate from a nominal behavior Applications: cybersecurity, quality control, fraud detection, fault detection, health care, . . .

  5. Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction Anomaly Detection Objective: identify patterns that deviate from a nominal behavior Applications: cybersecurity, quality control, fraud detection, fault detection, health care, . . . 0.45 In literature typically f 0 ( x ) f 1 ( x ) 0.4 statistical outlier detection 0.35 = 0.3 anomaly detection 0.25 f ( x ) 0.2 However an outlier could be 0.15 nominal tail event 0.1 or 0.05 real anomalous event 0 -5 0 5 10 (e.g., mean shift) x

  6. Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction Problem Formulation Instead of anomaly = outlier , consider also temporal dimension Nominal 4 Proposed Model 2 anomaly = persistent outliers x ( t ) 0 -2 outlier Objective -4 0 2 4 6 8 10 12 14 16 18 20 t Timely and accurate detection of Anomaly after t=10 with prob. 0.2 4 anomalies in high-dimensional 2 persistent outliers datasets x ( t ) 0 -2 Approach -4 0 2 4 6 8 10 12 14 16 18 20 t Sequential & Nonparametric anomaly detection

  7. Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction Motivating Facts: IoT Security, Smart Grid, . . . IoT devices: 8.4B in 2017 and expected to hit 20B by 2020 1 IoT systems: highly vulnerable – needs scalable security solutions 2 Mirai IoT botnet: largest recorded DDoS attack with at least 1.1 Tbps bandwidth (Oct. 2016) 2 Persirai IoT botnet targets at least 120,000 IP cams (May 2017) 3 A plausible cyberattack against the US grid: 100M people may be left without power with up to $1 trillion of monetary loss 4 1 R. Minerva, A. Biru, and D. Rotondi, “Towards a definition of the Internet of Things (IoT),” IEEE Internet Initiative, no. 1, 2015. 2 E. Bertino and N. Islam, “Botnets and Internet of Things Security,” Computer, vol. 50, no. 2, pp. 76-79, Feb. 2017. 3 Trend Micro, “Persirai: New Internet of Things (IoT) Botnet Targets IP Cameras”, May 9 , 2017, available online 4 Trevor Maynard and Nick Beecroft, “Business Blackout,” Lloyd’s Emerging Risk Report, p. 60, May 2015.

  8. Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction Motivating Facts: IoT Security, Smart Grid, . . . Challenges: Unknown anomalous distribution: parametric methods, as well as signature-based methods (e.g., antivirus) are not feasible High-dimensional problems: even nominal distribution is difficult to know Nonparametric methods are needed Timely and accurate detection is critical

  9. Nonparametric Sequential Change Detection for High-Dimensional Problems Background Background

  10. Nonparametric Sequential Change Detection for High-Dimensional Problems Background Sequential Change Detection - CUSUM inf T sup sup E τ [ T − τ | T ≥ τ ] s.t. E ∞ [ T ] ≥ β { x 1 ,..., x T } τ � � W t − 1 + log f 1 ( x t ) W t = max f 0 ( x t ) , 0 T = min { t : W t ≥ h }

  11. Nonparametric Sequential Change Detection for High-Dimensional Problems Background Statistical Outlier Detection Needs to know a statistical description f 0 of the nominal (e.g., no attack) behavior (baseline) Determines instances that significantly deviate from the baseline � ∞ With f 0 completely known, x is outlier if f 0 ( y )d y < α (p-value) x Equivalently, if x �∈ most compact set of data points under f 0 (minimum volume set) � � Ω α = arg min d y subject to f 0 ( y )d y ≥ 1 − α A A A 0.4 Uniformly most powerful test when 0.35 anomalous distribution is a linear mixture 0.3 0.25 of f 0 and the uniform distribution f 0 ( x ) 0.2 Coincides with minimum entropy set which 0.15 0.1 minimizes the R´ enyi entropy while 0.05 satisfying the same false alarm constraint 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 x

  12. Nonparametric Sequential Change Detection for High-Dimensional Problems Background Geometric Entropy Minimization (GEM) 0.8 High-dimensional datasets: even if f 0 is Training set 1 0.75 Training set 2 known, very computationally expensive Test set 0.7 (if not impossible) to determine Ω α 0.65 L ( K ) 0.6 Various methods for learning Ω α L 1 x ij 2 0.55 t GEM is very effective with 0.5 0.45 high-dimensional datasets while 0.4 asymptotically achieving Ω α for L 2 0.35 lim K , N →∞ K / N → 1 − α 0.3 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 x ij 1 t Training: Randomly partitions training set into two and forms K - k NN graph 5 K k ¯ � � X N 1 L k ( X N 1 K , X N 2 ) = K = arg min | e i ( l ) | γ X N 1 i =1 l = k ∗ K Test: new point x t ∈ R d outlier if x t �∈ ¯ X N 1 +1 , K l = k ∗ | e t ( l ) | γ > L ( K ) equivalently if L t = � k 5 A. O. Hero III, “Geometric entropy minimization (GEM) for anomaly detection and localization”, NIPS, pp. 585-592, 2006

  13. Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test ODIT: Online Discrepancy Test

  14. Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test Online Discrepancy Test (ODIT) 0.7 GEM lacks the temporal aspect ODIT statistic, s ij t Detection threshold, h 0.6 In GEM, x t is outlier if l = k ∗ | e i ( l ) | γ > L ( K ) L t = � k 0.5 0.4 In ODIT, D t = L t − L ( K ) is treated as some positive/negative evidence for 0.3 anomaly 0.2 D t approximates ℓ t = log p ( r ( x t ) | H 1 ) 0.1 p ( r ( x t ) | H 0 ) between H 1 claiming x t is anomalous and 0 1 2 3 4 5 6 7 8 9 10 t H 0 claiming x t is nominal

  15. Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test Online Discrepancy Test (ODIT) 0.7 GEM lacks the temporal aspect ODIT statistic, s ij t Detection threshold, h 0.6 In GEM, x t is outlier if l = k ∗ | e i ( l ) | γ > L ( K ) L t = � k 0.5 0.4 In ODIT, D t = L t − L ( K ) is treated as some positive/negative evidence for 0.3 anomaly 0.2 D t approximates ℓ t = log p ( r ( x t ) | H 1 ) 0.1 p ( r ( x t ) | H 0 ) between H 1 claiming x t is anomalous and 0 1 2 3 4 5 6 7 8 9 10 t H 0 claiming x t is nominal Assuming independence, � T t =1 D t gives aggregate anomaly evidence until time T (as � T t =1 ℓ t , sufficient statistic for optimum detection) Similar to CUSUM (optimum minimax sequential change detector), ODIT decides using T d = min { t : s t ≥ h } , s t = max { s t − 1 + D t , 0 }

  16. Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test Theoretical Justification - Asymptotic Asymptotic Optimality - Scalarized problem As training set grows ( N 2 → ∞ ) ODIT is asymptotically optimum for H 0 : r ( x t ) ∼ f k 0 , ∀ t H 1 : r ( x t ) ∼ f k and r ( x t ) ∼ f k 0 , t < τ, uni , t ≥ τ { x t } independent r ( x t ) kNN distance f 0 ( x t ) > 0 Lebesgue continuous f k 0 and f k uni distributions of kNN distance under f 0 and uniform distr. � ∞ r α f k on a d -dimensional grid with spacing r α where 0 ( r )d r = α

  17. Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test Sketch of the Proof For independent { x t } , continuous f 0 > 0 defines a non-homogeneous Poisson point process with continuous rate λ ( x ) > 0. Obtain a homogeneous Poisson point process with rate k by defining a d -dimensional non-homogeneous grid with volume k /λ ( x ) 6 For this homogeneous Poisson point process, nearest neighbor function is given by D x ( r d ) = k d v d ( x , r ) e − kv d ( x , r ) d r d Under H 0 , r ( x t ) = r t comes from f k 0 which can be computed using training set as L t . Under H 1 , r ( x t ) = r α comes from f k uni which has a single atom at r α , computed as L ( K ) . As training set grows, L t → r t and L ( K ) → r α D x ( r α ) D x ( r t ) = kc ( r d t − r d The optimum CUSUM test computes log α ) 6 Robert Gallager. 6.262 Discrete Stochastic Processes, Chapter 2. Spring 2011. Massachusetts Institute of Technology: MIT OpenCourseWare, https://ocw.mit.edu. License: Creative Commons BY-NC-SA.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend