Sensitivity of PCA for Traffic Anomaly Detection Evaluating the - - PowerPoint PPT Presentation

sensitivity of pca for traffic anomaly detection
SMART_READER_LITE
LIVE PREVIEW

Sensitivity of PCA for Traffic Anomaly Detection Evaluating the - - PowerPoint PPT Presentation

Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1 , Augustin Soule 2 , Jennifer Rexford 1 , Christophe Diot 2 1 Princeton University, 2 Thomson Research Outline Context


slide-1
SLIDE 1

Sensitivity of PCA for Traffic Anomaly Detection

Evaluating the robustness of current best practices

Haakon Ringberg1, Augustin Soule2, Jennifer Rexford1, Christophe Diot2

1Princeton University, 2Thomson Research

slide-2
SLIDE 2

2

Outline

  • Context
  • Background and motivation
  • Bigger picture
  • PCA (subspace method) in one slide
  • Challenges with current PCA methodology
  • Conclusion & future directions
slide-3
SLIDE 3

3

Background

Promising applications of PCA to AD

[Lakhina et al, SIGCOMM 04 & 05]

But we weren’t nearly as successful applying

technique to a new data set

Same source code

What were we doing wrong?

Unable to tune the technique

slide-4
SLIDE 4

4

Bigger Picture

Many statistical techniques evaluated for AD

e.g., Wavelets, PCA, Kalman filters Promising early results

But questions about performance remain

What did the researchers have to do in order to

achieve presented results?

slide-5
SLIDE 5

5

Questions about techniques

“Tunability” of technique

Number of parameters Sensitivity to parameters Interpretability of parameters

Other aspects of robustness

Sensitivity to drift in underlying data Sensitivity to sampling

Assumptions about the underlying data

slide-6
SLIDE 6

6

Principal Components Analysis (PCA)

PCA transforms data

into new coordinate system

Principal components

(new bases) ordered by captured variance

The first k (topk) tend to

capture periodic trends

normal subspace

  • vs. anomalous subspace
slide-7
SLIDE 7

7

Data used

Géant and Abilene networks IP flow traces 21/11 through 28/11 2005 Detected anomalies were

manually inspected

slide-8
SLIDE 8

8

Outline

  • Context
  • Challenges with current PCA methodology
  • Sensitivity to its parameters
  • Contamination of normalcy
  • Identifying the location of detected anomalies
  • Conclusion & future directions
slide-9
SLIDE 9

9

Sensitivity to topk

Where is the line drawn

between normal and anomalous?

What is too anomalous?

topk signal anomalous normal PCA

slide-10
SLIDE 10

10

Sensitivity to topk

Very sensitive to topk

Total detections and FP

Not an issue if topk

were tunable

Tried many methods

3σ deviation heuristic Cattell’s Scree Test Humphrey-Ilgen Kaiser’s Criterion

None are reliable

slide-11
SLIDE 11

11

Contamination of normalcy

Large anomalies may be

included among topk

Invalidates assumption that

top PCs are periodic

Pollutes definition of normal In our study, the outage to

the left affected 75/77 links

  • Only detected on a handful!
slide-12
SLIDE 12

12

Conclusion & future directions

PCA (subspace method) methodology issues

Sensitivity to topk parameter Contamination of normal subspace Identifying the location of detected anomalies

Generally: room for rigorous evaluation of

statistical techniques applied to AD

Tunability, robustness

Assumptions about underlying data

Under what conditions does method excel?

slide-13
SLIDE 13

Thanks! Questions?

Haakon Ringberg Princeton University Computer Science http://www.cs.princeton.edu/~hlarsen/

slide-14
SLIDE 14

14

Identifying anomaly locations

Spikes when state

vector projected on anomaly subspace

But network operators

don’t care about this

They want to know

where it happened!

How do we find the

  • riginal location of the

anomaly?

state vector anomaly subspace

slide-15
SLIDE 15

15

Identifying anomaly locations

Previous work used a

simple heuristic

Associate detected spike

with k flows with the largest contribution to the state vector v

No clear a priori reason

for this association

state vector anomaly subspace