sensitivity of pca for traffic anomaly detection
play

Sensitivity of PCA for Traffic Anomaly Detection Evaluating the - PowerPoint PPT Presentation

Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1 , Augustin Soule 2 , Jennifer Rexford 1 , Christophe Diot 2 1 Princeton University, 2 Thomson Research Outline Context


  1. Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1 , Augustin Soule 2 , Jennifer Rexford 1 , Christophe Diot 2 1 Princeton University, 2 Thomson Research

  2. Outline Context � Background and motivation � Bigger picture � PCA (subspace method) in one slide � Challenges with current PCA methodology � Conclusion & future directions � 2

  3. Background � Promising applications of PCA to AD � [Lakhina et al, SIGCOMM 04 & 05] � But we weren’t nearly as successful applying technique to a new data set � Same source code � What were we doing wrong? � Unable to tune the technique 3

  4. Bigger Picture � Many statistical techniques evaluated for AD � e.g. , Wavelets, PCA, Kalman filters � Promising early results � But questions about performance remain � What did the researchers have to do in order to achieve presented results? 4

  5. Questions about techniques � “Tunability” of technique � Number of parameters � Sensitivity to parameters � Interpretability of parameters � Other aspects of robustness � Sensitivity to drift in underlying data � Sensitivity to sampling � Assumptions about the underlying data 5

  6. Principal Components Analysis (PCA) � PCA transforms data into new coordinate system � Principal components (new bases) ordered by captured variance � The first k (top k ) tend to capture periodic trends � normal subspace � vs. anomalous subspace 6

  7. Data used � Géant and Abilene networks � IP flow traces � 21/11 through 28/11 2005 � Detected anomalies were manually inspected 7

  8. Outline Context � Challenges with current PCA methodology � Sensitivity to its parameters � Contamination of normalcy � Identifying the location of detected anomalies � Conclusion & future directions � 8

  9. Sensitivity to top k topk � Where is the line drawn between normal and PCA anomalous? normal signal � What is too anomalous? anomalous 9

  10. Sensitivity to top k � Very sensitive to top k � Total detections and FP � Not an issue if top k were tunable � Tried many methods � 3 σ deviation heuristic � Cattell’s Scree Test � Humphrey-Ilgen � Kaiser’s Criterion � None are reliable 10

  11. Contamination of normalcy � Large anomalies may be included among top k � Invalidates assumption that top PCs are periodic � Pollutes definition of normal � In our study, the outage to the left affected 75/77 links Only detected on a handful! � 11

  12. Conclusion & future directions � PCA (subspace method) methodology issues � Sensitivity to top k parameter � Contamination of normal subspace � Identifying the location of detected anomalies � Generally: room for rigorous evaluation of statistical techniques applied to AD � Tunability, robustness � Assumptions about underlying data � Under what conditions does method excel? 12

  13. Thanks! Questions? Haakon Ringberg Princeton University Computer Science http://www.cs.princeton.edu/~hlarsen/

  14. Identifying anomaly locations � Spikes when state vector projected on anomaly subspace � But network operators don’t care about this � They want to know where it happened! state vector � How do we find the original location of the anomaly? 14 anomaly subspace

  15. Identifying anomaly locations � Previous work used a state vector simple heuristic � Associate detected spike with k flows with the largest contribution to the anomaly subspace state vector v � No clear a priori reason for this association 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend