Nonparametric Sequential Change Detection for High-Dimensional Problems
Nonparametric Sequential Change Detection for High-Dimensional - - PowerPoint PPT Presentation
Nonparametric Sequential Change Detection for High-Dimensional - - PowerPoint PPT Presentation
Nonparametric Sequential Change Detection for High-Dimensional Problems Nonparametric Sequential Change Detection for High-Dimensional Problems Yasin Ylmaz Electrical Engineering, University of South Florida Allerton 2017 Nonparametric
Nonparametric Sequential Change Detection for High-Dimensional Problems
Outline
1 Introduction 2 Background 3 ODIT: Online Discrepancy Test 4 Numerical Results 5 Conclusion
Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction
Introduction
Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction
Anomaly Detection
Objective: identify patterns that deviate from a nominal behavior Applications: cybersecurity, quality control, fraud detection, fault detection, health care, . . .
x f(x) f0(x) f1(x)
Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction
Anomaly Detection
Objective: identify patterns that deviate from a nominal behavior Applications: cybersecurity, quality control, fraud detection, fault detection, health care, . . . In literature typically statistical outlier detection = anomaly detection However an outlier could be nominal tail event
- r
real anomalous event (e.g., mean shift)
- 5
5 10
x
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
f(x) f0(x) f1(x)
Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction
Problem Formulation
Instead of anomaly = outlier, consider also temporal dimension Proposed Model anomaly = persistent outliers Objective Timely and accurate detection of anomalies in high-dimensional datasets Approach Sequential & Nonparametric anomaly detection
2 4 6 8 10 12 14 16 18 20
t
- 4
- 2
2 4
x(t) Nominal
2 4 6 8 10 12 14 16 18 20
t
- 4
- 2
2 4
x(t) Anomaly after t=10 with prob. 0.2
persistent outliers
- utlier
Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction
Motivating Facts: IoT Security, Smart Grid, . . .
IoT devices: 8.4B in 2017 and expected to hit 20B by 2020 1 IoT systems: highly vulnerable – needs scalable security solutions 2 Mirai IoT botnet: largest recorded DDoS attack with at least 1.1 Tbps bandwidth (Oct. 2016) 2 Persirai IoT botnet targets at least 120,000 IP cams (May 2017) 3 A plausible cyberattack against the US grid: 100M people may be left without power with up to $1 trillion of monetary loss 4
- 1R. Minerva, A. Biru, and D. Rotondi, “Towards a definition of the Internet of
Things (IoT),” IEEE Internet Initiative, no. 1, 2015.
- 2E. Bertino and N. Islam, “Botnets and Internet of Things Security,” Computer,
- vol. 50, no. 2, pp. 76-79, Feb. 2017.
3Trend Micro, “Persirai: New Internet of Things (IoT) Botnet Targets IP
Cameras”, May 9 , 2017, available online
4Trevor Maynard and Nick Beecroft, “Business Blackout,” Lloyd’s Emerging Risk
Report, p. 60, May 2015.
Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction
Motivating Facts: IoT Security, Smart Grid, . . .
Challenges: Unknown anomalous distribution: parametric methods, as well as signature-based methods (e.g., antivirus) are not feasible High-dimensional problems: even nominal distribution is difficult to know Nonparametric methods are needed Timely and accurate detection is critical
Nonparametric Sequential Change Detection for High-Dimensional Problems Background
Background
Nonparametric Sequential Change Detection for High-Dimensional Problems Background
Sequential Change Detection - CUSUM
inf
T sup τ
sup
{x 1,...,x T }
Eτ[T − τ|T ≥ τ] s.t. E∞[T] ≥ β Wt = max
- Wt−1 + log f1(xt)
f0(xt), 0
- T = min{t : Wt ≥ h}
Nonparametric Sequential Change Detection for High-Dimensional Problems Background
Statistical Outlier Detection
Needs to know a statistical description f0 of the nominal (e.g., no attack) behavior (baseline) Determines instances that significantly deviate from the baseline With f0 completely known, x is outlier if ∞
x
f0(y)dy < α (p-value) Equivalently, if x ∈ most compact set of data points under f0 (minimum volume set) Ωα = arg min
A
- A
dy subject to
- A
f0(y)dy ≥ 1 − α
- 5
- 4
- 3
- 2
- 1
1 2 3 4 5
x
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
f0(x)
Uniformly most powerful test when anomalous distribution is a linear mixture
- f f0 and the uniform distribution
Coincides with minimum entropy set which minimizes the R´ enyi entropy while satisfying the same false alarm constraint
Nonparametric Sequential Change Detection for High-Dimensional Problems Background
Geometric Entropy Minimization (GEM)
High-dimensional datasets: even if f0 is known, very computationally expensive (if not impossible) to determine Ωα Various methods for learning Ωα GEM is very effective with high-dimensional datasets while asymptotically achieving Ωα for limK,N→∞ K/N → 1 − α
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
xij1
t 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
xij2
t L(K) L1 L2
Training set 1 Training set 2 Test set
Training: Randomly partitions training set into two and forms K-kNN graph 5 ¯ X N1
K = arg min X N1
K
Lk(X N1
K , X N2) = K
- i=1
k
- l=k∗
|ei(l)|γ Test: new point xt ∈ Rd outlier if xt ∈ ¯ X N1+1
K
, equivalently if Lt = k
l=k∗ |et(l)|γ > L(K)
- 5A. O. Hero III, “Geometric entropy minimization (GEM) for anomaly detection
and localization”, NIPS, pp. 585-592, 2006
Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test
ODIT: Online Discrepancy Test
Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test
Online Discrepancy Test (ODIT)
GEM lacks the temporal aspect In GEM, xt is outlier if Lt = k
l=k∗ |ei(l)|γ > L(K)
In ODIT, Dt = Lt − L(K) is treated as some positive/negative evidence for anomaly Dt approximates ℓt = log p(r(x t)|H1)
p(r(x t)|H0)
between H1 claiming xt is anomalous and H0 claiming xt is nominal
1 2 3 4 5 6 7 8 9 10
t
0.1 0.2 0.3 0.4 0.5 0.6 0.7
ODIT statistic, sij
t
Detection threshold, h
Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test
Online Discrepancy Test (ODIT)
GEM lacks the temporal aspect In GEM, xt is outlier if Lt = k
l=k∗ |ei(l)|γ > L(K)
In ODIT, Dt = Lt − L(K) is treated as some positive/negative evidence for anomaly Dt approximates ℓt = log p(r(x t)|H1)
p(r(x t)|H0)
between H1 claiming xt is anomalous and H0 claiming xt is nominal
1 2 3 4 5 6 7 8 9 10
t
0.1 0.2 0.3 0.4 0.5 0.6 0.7
ODIT statistic, sij
t
Detection threshold, h
Assuming independence, T
t=1 Dt gives aggregate anomaly evidence
until time T (as T
t=1 ℓt, sufficient statistic for optimum detection)
Similar to CUSUM (optimum minimax sequential change detector), ODIT decides using Td = min{t : st ≥ h}, st = max{st−1 + Dt, 0}
Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test
Theoretical Justification - Asymptotic
Asymptotic Optimality - Scalarized problem As training set grows (N2 → ∞) ODIT is asymptotically optimum for H0 : r(xt) ∼ f k
0 , ∀t
H1 : r(xt) ∼ f k
0 , t < τ,
and r(xt) ∼ f k
uni, t ≥ τ
{xt} independent r(xt) kNN distance f0(xt) > 0 Lebesgue continuous f k
0 and f k uni distributions of kNN distance under f0 and uniform distr.
- n a d-dimensional grid with spacing rα where
∞
rα f k 0 (r)dr = α
Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test
Sketch of the Proof
For independent {xt}, continuous f0 > 0 defines a non-homogeneous Poisson point process with continuous rate λ(x) > 0. Obtain a homogeneous Poisson point process with rate k by defining a d-dimensional non-homogeneous grid with volume k/λ(x) 6 For this homogeneous Poisson point process, nearest neighbor function is given by Dx (r d) = k dvd(x, r) dr d e−kvd (x,r) Under H0, r(xt) = rt comes from f k
0 which can be computed using
training set as Lt. Under H1, r(xt) = rα comes from f k
uni which has a single atom at rα,
computed as L(K). As training set grows, Lt → rt and L(K) → rα The optimum CUSUM test computes log
Dx (rα) Dx (rt) = kc(r d t − r d α) 6Robert Gallager. 6.262 Discrete Stochastic Processes, Chapter 2. Spring 2011.
Massachusetts Institute of Technology: MIT OpenCourseWare, https://ocw.mit.edu. License: Creative Commons BY-NC-SA.
Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test
Theoretical Justification - Nonasymptotic
CUSUM procedure can be expressed in terms of a general discrepancy metric, applicable to any number sequence
stop when discrepancy g(ℓt) 7 of observations with respect to f0 is large enough
Discrepancy and CUSUM Tc = min{t : g(ℓt) ≥ hc}, ℓt =
- log f1(x1)
f0(x1) . . . log f1(xt) f0(xt)
- ,
g(ℓt) = max
1≤n1≤n2≤t n2
- i=n1
ℓi
t,
2 4 6 8 10 12 14 16 18 20 22
t
- 3.5
- 3
- 2.5
- 2
- 1.5
- 1
- 0.5
Qt g(`t) max
15j5t Qj t
Qt = t
i=1 ℓt
- 7B. A. Moser et al., “On stability of distance measures for event sequences induced
by level-crossing sampling”, IEEE Trans. Signal Process., vol. 62, no. 8, pp. 1987–1999, 2014.
Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test
ODIT Algorithm
Initialize: s ← 0, t ← 1 Partition training set into X N1 and X N2 Determine L(K) from K-kNN graph ¯ X N1
K
While s < h
Get new data xt and compute Dt = Lt − L(K) s = max{s + Dt, 0} t ← t + 1
Declare anomaly
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
xij1
t 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
xij2
t L(K) L1 L2
Training set 1 Training set 2 Test set
1 2 3 4 5 6 7 8 9 10
t
0.1 0.2 0.3 0.4 0.5 0.6 0.7
ODIT statistic, sij
t
Detection threshold, h
Nonparametric Sequential Change Detection for High-Dimensional Problems Numerical Results
Numerical Results
Nonparametric Sequential Change Detection for High-Dimensional Problems Numerical Results
Simulations
f0 is a 2D independent Gaussian with zero mean and σ = 0.1 f1 = 0.8f0 + 0.2U[0, 1] Training set 10, 000 points (N1 = 1000, N2 = 9000) α = 0.05, k = 1, K = αN1 Parametric clairvoyant CUSUM knows both f0 and f1 exactly Generalized CUSUM exactly knows f0, but estimates the uniform distribution upper bound as 0.9
0.5 1 1.5 2 2.5 3 3.5 4
! log10 P(? = H1jH0)
4 4.5 5 5.5 6 6.5
Average Detection Delay
ODIT CUSUM G-CUSUM
Nonparametric Sequential Change Detection for High-Dimensional Problems Numerical Results
Cybersecurity in Smart Grid
CC DA i SM j SM 1 SM J DA 1 DA N xij1
t
xijk
t
xijK
t
yij
t
yi1
t
yiJ
t
sij
t
siJ
t
si1
t
ui
t, si t
u1
t, s1 t
uN
t , sN t
zi
t
z1
t
zN
t
vt, ut, st SA k SA 1 SA K
Control center, 10 data aggregators, 1, 000 smart meters, 10, 000 smart appliances 3% of the HANs are attacked. In each attacked HAN, each smart appliance is attacked with prob. 0.5 Baseline iid ∼ N(0.5, 0.12) Attack data either ∼ N(0.5, (0.1η)2), η > 1 (Jamming) or ∼ N(0.5 + ∆, 0.12), ∆ ∈ R (False Data Injection) Even a small mismatch between the actual and assumed parameter values degrade the performance of CUSUM
1.5 2 2.5 3
Jamming noise level (nominal standard deviation)
20 40 60 80 100 120 140 160 180 200
Average detection delay
ODIT CUSUM G-CUSUM
0.5 1 1.5 2 2.5
! log10 P(false alarm), False Data Injection
5 10 15 20 25 30 35 40 45
Average detection delay
ODIT CUSUM G-CUSUM
Nonparametric Sequential Change Detection for High-Dimensional Problems Numerical Results
Human Activity Recognition
Online monitoring of a dynamic system using “Heterogeneity Human Activity Recognition Dataset” 8 obtained from the UCI Machine Learning Repository Smartwatch accelerometer data: 3.5M data points with 5 numeric features 6 activities: biking, sitting, standing, walking, stair up, and stair down Focusing on activity transitions we tested online detection performance G-CUSUM fits multivariate Gaussian models to baseline and anomalous dist. Re-train after detecting a change in the activity (N1 = 10, N2 = 20)
2000 4000 6000 8000 10000 12000
False alarm period
200 400 600 800 1000 1200
Average detection delay
ODIT G-CUSUM
- 8A. Stisen et al., “Smart devices are different: Assessing and mitigating mobile
sensing heterogeneities for activity recognition,” SenSys, 2015.
Nonparametric Sequential Change Detection for High-Dimensional Problems Conclusion
Conclusion
Nonparametric Sequential Change Detection for High-Dimensional Problems Conclusion
Conclusions
With the proliferation of IoT devices, and the ease of triggering DoS attacks even from unsophisticated malicious parties, there is an increasing need for developing scalable and effective solutions. A novel anomaly detection framework
Scalable: applicable to high-dimensional datasets (big data problems) Nonparametric: agnostic to data-type and protocol Online system monitoring Asymptotically optimum for testing against uniformly distributed anomalies
Outperforms sequential change detector CUSUM that estimates parameters from data Outperforms even clairvoyant CUSUM in case of a small to moderate variance increase (e.g., Jamming attack)
Nonparametric Sequential Change Detection for High-Dimensional Problems Conclusion