Nonparametric Sequential Change Detection for High-Dimensional - - PowerPoint PPT Presentation

nonparametric sequential change detection for high
SMART_READER_LITE
LIVE PREVIEW

Nonparametric Sequential Change Detection for High-Dimensional - - PowerPoint PPT Presentation

Nonparametric Sequential Change Detection for High-Dimensional Problems Nonparametric Sequential Change Detection for High-Dimensional Problems Yasin Ylmaz Electrical Engineering, University of South Florida Allerton 2017 Nonparametric


slide-1
SLIDE 1

Nonparametric Sequential Change Detection for High-Dimensional Problems

Nonparametric Sequential Change Detection for High-Dimensional Problems

Yasin Yılmaz Electrical Engineering, University of South Florida Allerton 2017

slide-2
SLIDE 2

Nonparametric Sequential Change Detection for High-Dimensional Problems

Outline

1 Introduction 2 Background 3 ODIT: Online Discrepancy Test 4 Numerical Results 5 Conclusion

slide-3
SLIDE 3

Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction

Introduction

slide-4
SLIDE 4

Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction

Anomaly Detection

Objective: identify patterns that deviate from a nominal behavior Applications: cybersecurity, quality control, fraud detection, fault detection, health care, . . .

x f(x) f0(x) f1(x)

slide-5
SLIDE 5

Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction

Anomaly Detection

Objective: identify patterns that deviate from a nominal behavior Applications: cybersecurity, quality control, fraud detection, fault detection, health care, . . . In literature typically statistical outlier detection = anomaly detection However an outlier could be nominal tail event

  • r

real anomalous event (e.g., mean shift)

  • 5

5 10

x

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

f(x) f0(x) f1(x)

slide-6
SLIDE 6

Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction

Problem Formulation

Instead of anomaly = outlier, consider also temporal dimension Proposed Model anomaly = persistent outliers Objective Timely and accurate detection of anomalies in high-dimensional datasets Approach Sequential & Nonparametric anomaly detection

2 4 6 8 10 12 14 16 18 20

t

  • 4
  • 2

2 4

x(t) Nominal

2 4 6 8 10 12 14 16 18 20

t

  • 4
  • 2

2 4

x(t) Anomaly after t=10 with prob. 0.2

persistent outliers

  • utlier
slide-7
SLIDE 7

Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction

Motivating Facts: IoT Security, Smart Grid, . . .

IoT devices: 8.4B in 2017 and expected to hit 20B by 2020 1 IoT systems: highly vulnerable – needs scalable security solutions 2 Mirai IoT botnet: largest recorded DDoS attack with at least 1.1 Tbps bandwidth (Oct. 2016) 2 Persirai IoT botnet targets at least 120,000 IP cams (May 2017) 3 A plausible cyberattack against the US grid: 100M people may be left without power with up to $1 trillion of monetary loss 4

  • 1R. Minerva, A. Biru, and D. Rotondi, “Towards a definition of the Internet of

Things (IoT),” IEEE Internet Initiative, no. 1, 2015.

  • 2E. Bertino and N. Islam, “Botnets and Internet of Things Security,” Computer,
  • vol. 50, no. 2, pp. 76-79, Feb. 2017.

3Trend Micro, “Persirai: New Internet of Things (IoT) Botnet Targets IP

Cameras”, May 9 , 2017, available online

4Trevor Maynard and Nick Beecroft, “Business Blackout,” Lloyd’s Emerging Risk

Report, p. 60, May 2015.

slide-8
SLIDE 8

Nonparametric Sequential Change Detection for High-Dimensional Problems Introduction

Motivating Facts: IoT Security, Smart Grid, . . .

Challenges: Unknown anomalous distribution: parametric methods, as well as signature-based methods (e.g., antivirus) are not feasible High-dimensional problems: even nominal distribution is difficult to know Nonparametric methods are needed Timely and accurate detection is critical

slide-9
SLIDE 9

Nonparametric Sequential Change Detection for High-Dimensional Problems Background

Background

slide-10
SLIDE 10

Nonparametric Sequential Change Detection for High-Dimensional Problems Background

Sequential Change Detection - CUSUM

inf

T sup τ

sup

{x 1,...,x T }

Eτ[T − τ|T ≥ τ] s.t. E∞[T] ≥ β Wt = max

  • Wt−1 + log f1(xt)

f0(xt), 0

  • T = min{t : Wt ≥ h}
slide-11
SLIDE 11

Nonparametric Sequential Change Detection for High-Dimensional Problems Background

Statistical Outlier Detection

Needs to know a statistical description f0 of the nominal (e.g., no attack) behavior (baseline) Determines instances that significantly deviate from the baseline With f0 completely known, x is outlier if ∞

x

f0(y)dy < α (p-value) Equivalently, if x ∈ most compact set of data points under f0 (minimum volume set) Ωα = arg min

A

  • A

dy subject to

  • A

f0(y)dy ≥ 1 − α

  • 5
  • 4
  • 3
  • 2
  • 1

1 2 3 4 5

x

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

f0(x)

Uniformly most powerful test when anomalous distribution is a linear mixture

  • f f0 and the uniform distribution

Coincides with minimum entropy set which minimizes the R´ enyi entropy while satisfying the same false alarm constraint

slide-12
SLIDE 12

Nonparametric Sequential Change Detection for High-Dimensional Problems Background

Geometric Entropy Minimization (GEM)

High-dimensional datasets: even if f0 is known, very computationally expensive (if not impossible) to determine Ωα Various methods for learning Ωα GEM is very effective with high-dimensional datasets while asymptotically achieving Ωα for limK,N→∞ K/N → 1 − α

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

xij1

t 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

xij2

t L(K) L1 L2

Training set 1 Training set 2 Test set

Training: Randomly partitions training set into two and forms K-kNN graph 5 ¯ X N1

K = arg min X N1

K

Lk(X N1

K , X N2) = K

  • i=1

k

  • l=k∗

|ei(l)|γ Test: new point xt ∈ Rd outlier if xt ∈ ¯ X N1+1

K

, equivalently if Lt = k

l=k∗ |et(l)|γ > L(K)

  • 5A. O. Hero III, “Geometric entropy minimization (GEM) for anomaly detection

and localization”, NIPS, pp. 585-592, 2006

slide-13
SLIDE 13

Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test

ODIT: Online Discrepancy Test

slide-14
SLIDE 14

Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test

Online Discrepancy Test (ODIT)

GEM lacks the temporal aspect In GEM, xt is outlier if Lt = k

l=k∗ |ei(l)|γ > L(K)

In ODIT, Dt = Lt − L(K) is treated as some positive/negative evidence for anomaly Dt approximates ℓt = log p(r(x t)|H1)

p(r(x t)|H0)

between H1 claiming xt is anomalous and H0 claiming xt is nominal

1 2 3 4 5 6 7 8 9 10

t

0.1 0.2 0.3 0.4 0.5 0.6 0.7

ODIT statistic, sij

t

Detection threshold, h

slide-15
SLIDE 15

Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test

Online Discrepancy Test (ODIT)

GEM lacks the temporal aspect In GEM, xt is outlier if Lt = k

l=k∗ |ei(l)|γ > L(K)

In ODIT, Dt = Lt − L(K) is treated as some positive/negative evidence for anomaly Dt approximates ℓt = log p(r(x t)|H1)

p(r(x t)|H0)

between H1 claiming xt is anomalous and H0 claiming xt is nominal

1 2 3 4 5 6 7 8 9 10

t

0.1 0.2 0.3 0.4 0.5 0.6 0.7

ODIT statistic, sij

t

Detection threshold, h

Assuming independence, T

t=1 Dt gives aggregate anomaly evidence

until time T (as T

t=1 ℓt, sufficient statistic for optimum detection)

Similar to CUSUM (optimum minimax sequential change detector), ODIT decides using Td = min{t : st ≥ h}, st = max{st−1 + Dt, 0}

slide-16
SLIDE 16

Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test

Theoretical Justification - Asymptotic

Asymptotic Optimality - Scalarized problem As training set grows (N2 → ∞) ODIT is asymptotically optimum for H0 : r(xt) ∼ f k

0 , ∀t

H1 : r(xt) ∼ f k

0 , t < τ,

and r(xt) ∼ f k

uni, t ≥ τ

{xt} independent r(xt) kNN distance f0(xt) > 0 Lebesgue continuous f k

0 and f k uni distributions of kNN distance under f0 and uniform distr.

  • n a d-dimensional grid with spacing rα where

rα f k 0 (r)dr = α

slide-17
SLIDE 17

Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test

Sketch of the Proof

For independent {xt}, continuous f0 > 0 defines a non-homogeneous Poisson point process with continuous rate λ(x) > 0. Obtain a homogeneous Poisson point process with rate k by defining a d-dimensional non-homogeneous grid with volume k/λ(x) 6 For this homogeneous Poisson point process, nearest neighbor function is given by Dx (r d) = k dvd(x, r) dr d e−kvd (x,r) Under H0, r(xt) = rt comes from f k

0 which can be computed using

training set as Lt. Under H1, r(xt) = rα comes from f k

uni which has a single atom at rα,

computed as L(K). As training set grows, Lt → rt and L(K) → rα The optimum CUSUM test computes log

Dx (rα) Dx (rt) = kc(r d t − r d α) 6Robert Gallager. 6.262 Discrete Stochastic Processes, Chapter 2. Spring 2011.

Massachusetts Institute of Technology: MIT OpenCourseWare, https://ocw.mit.edu. License: Creative Commons BY-NC-SA.

slide-18
SLIDE 18

Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test

Theoretical Justification - Nonasymptotic

CUSUM procedure can be expressed in terms of a general discrepancy metric, applicable to any number sequence

stop when discrepancy g(ℓt) 7 of observations with respect to f0 is large enough

Discrepancy and CUSUM Tc = min{t : g(ℓt) ≥ hc}, ℓt =

  • log f1(x1)

f0(x1) . . . log f1(xt) f0(xt)

  • ,

g(ℓt) = max

1≤n1≤n2≤t n2

  • i=n1

ℓi

t,

2 4 6 8 10 12 14 16 18 20 22

t

  • 3.5
  • 3
  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

Qt g(`t) max

15j5t Qj t

Qt = t

i=1 ℓt

  • 7B. A. Moser et al., “On stability of distance measures for event sequences induced

by level-crossing sampling”, IEEE Trans. Signal Process., vol. 62, no. 8, pp. 1987–1999, 2014.

slide-19
SLIDE 19

Nonparametric Sequential Change Detection for High-Dimensional Problems ODIT: Online Discrepancy Test

ODIT Algorithm

Initialize: s ← 0, t ← 1 Partition training set into X N1 and X N2 Determine L(K) from K-kNN graph ¯ X N1

K

While s < h

Get new data xt and compute Dt = Lt − L(K) s = max{s + Dt, 0} t ← t + 1

Declare anomaly

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

xij1

t 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

xij2

t L(K) L1 L2

Training set 1 Training set 2 Test set

1 2 3 4 5 6 7 8 9 10

t

0.1 0.2 0.3 0.4 0.5 0.6 0.7

ODIT statistic, sij

t

Detection threshold, h

slide-20
SLIDE 20

Nonparametric Sequential Change Detection for High-Dimensional Problems Numerical Results

Numerical Results

slide-21
SLIDE 21

Nonparametric Sequential Change Detection for High-Dimensional Problems Numerical Results

Simulations

f0 is a 2D independent Gaussian with zero mean and σ = 0.1 f1 = 0.8f0 + 0.2U[0, 1] Training set 10, 000 points (N1 = 1000, N2 = 9000) α = 0.05, k = 1, K = αN1 Parametric clairvoyant CUSUM knows both f0 and f1 exactly Generalized CUSUM exactly knows f0, but estimates the uniform distribution upper bound as 0.9

0.5 1 1.5 2 2.5 3 3.5 4

! log10 P(? = H1jH0)

4 4.5 5 5.5 6 6.5

Average Detection Delay

ODIT CUSUM G-CUSUM

slide-22
SLIDE 22

Nonparametric Sequential Change Detection for High-Dimensional Problems Numerical Results

Cybersecurity in Smart Grid

CC DA i SM j SM 1 SM J DA 1 DA N xij1

t

xijk

t

xijK

t

yij

t

yi1

t

yiJ

t

sij

t

siJ

t

si1

t

ui

t, si t

u1

t, s1 t

uN

t , sN t

zi

t

z1

t

zN

t

vt, ut, st SA k SA 1 SA K

Control center, 10 data aggregators, 1, 000 smart meters, 10, 000 smart appliances 3% of the HANs are attacked. In each attacked HAN, each smart appliance is attacked with prob. 0.5 Baseline iid ∼ N(0.5, 0.12) Attack data either ∼ N(0.5, (0.1η)2), η > 1 (Jamming) or ∼ N(0.5 + ∆, 0.12), ∆ ∈ R (False Data Injection) Even a small mismatch between the actual and assumed parameter values degrade the performance of CUSUM

1.5 2 2.5 3

Jamming noise level (nominal standard deviation)

20 40 60 80 100 120 140 160 180 200

Average detection delay

ODIT CUSUM G-CUSUM

0.5 1 1.5 2 2.5

! log10 P(false alarm), False Data Injection

5 10 15 20 25 30 35 40 45

Average detection delay

ODIT CUSUM G-CUSUM

slide-23
SLIDE 23

Nonparametric Sequential Change Detection for High-Dimensional Problems Numerical Results

Human Activity Recognition

Online monitoring of a dynamic system using “Heterogeneity Human Activity Recognition Dataset” 8 obtained from the UCI Machine Learning Repository Smartwatch accelerometer data: 3.5M data points with 5 numeric features 6 activities: biking, sitting, standing, walking, stair up, and stair down Focusing on activity transitions we tested online detection performance G-CUSUM fits multivariate Gaussian models to baseline and anomalous dist. Re-train after detecting a change in the activity (N1 = 10, N2 = 20)

2000 4000 6000 8000 10000 12000

False alarm period

200 400 600 800 1000 1200

Average detection delay

ODIT G-CUSUM

  • 8A. Stisen et al., “Smart devices are different: Assessing and mitigating mobile

sensing heterogeneities for activity recognition,” SenSys, 2015.

slide-24
SLIDE 24

Nonparametric Sequential Change Detection for High-Dimensional Problems Conclusion

Conclusion

slide-25
SLIDE 25

Nonparametric Sequential Change Detection for High-Dimensional Problems Conclusion

Conclusions

With the proliferation of IoT devices, and the ease of triggering DoS attacks even from unsophisticated malicious parties, there is an increasing need for developing scalable and effective solutions. A novel anomaly detection framework

Scalable: applicable to high-dimensional datasets (big data problems) Nonparametric: agnostic to data-type and protocol Online system monitoring Asymptotically optimum for testing against uniformly distributed anomalies

Outperforms sequential change detector CUSUM that estimates parameters from data Outperforms even clairvoyant CUSUM in case of a small to moderate variance increase (e.g., Jamming attack)

slide-26
SLIDE 26

Nonparametric Sequential Change Detection for High-Dimensional Problems Conclusion

Questions?

Thank you!