anomaly detection
play

Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative Anomaly Detection Motivation Developing an anomaly detection system Anomaly detection vs. supervised learning Choosing what features


  1. Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

  2. Administrative

  3. Anomaly Detection • Motivation • Developing an anomaly detection system • Anomaly detection vs. supervised learning • Choosing what features to use • Multivariate Gaussian distribution

  4. Anomaly Detection • Motivation • Developing an anomaly detection system • Anomaly detection vs. supervised learning • Choosing what features to use • Multivariate Gaussian distribution

  5. Anomaly detection example • Dataset 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑛) • New engine: 𝑦 text 𝑦 2 vibration • Aircraft engine features: • 𝑦 1 = heat generated • 𝑦 2 = vibration intensity 𝑦 1 heat

  6. Density estimation • Dataset 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑛) • Is 𝑦 text anomalous? 𝑦 2 Model 𝒒 𝒚 vibration • 𝑞 𝑦 text < 𝜗 → flag anomaly • 𝑞 𝑦 text ≥ 𝜗 → OK 𝑦 1 heat

  7. Anomaly detection example • Fraud detection • 𝑦 (𝑗) = features of user I’s activities • Model 𝑞 𝑦 from data • Identify unusual users by checking which have 𝑞 𝑦 < 𝜗 • Manufacturing • Monitoring computers in a data center • 𝑦 (𝑗) = features of machine i • 𝑦 1 = memory use, 𝑦 2 = number of disk accesses/sec • 𝑦 3 = CPU load, 𝑦 4 = CPU load/network traffic

  8. Anomaly Detection • Motivation • Developing an anomaly detection system • Anomaly detection vs. supervised learning • Choosing what features to use • Multivariate Gaussian distribution

  9. Gaussian (normal) distribution • Say 𝑦 ∈ 𝑆 . If 𝑦 is a distributed Gaussian with mean 𝜈 , variance 𝜏 2 . • 𝑦 ∼ 𝑂(𝜈, 𝜏 2 ) 𝜏 standard deviation 𝑦−𝜈 2 • 𝑞 𝑦; 𝜈, 𝜏 2 = 1 2𝜌𝜏 exp − 2𝜏 2

  10. Gaussian distribution examples

  11. Parameter estimation • Dataset 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑛) • 𝑦 ∼ 𝑂(𝜈, 𝜏 2 ) • Maximum likelihood estimation 1 𝑛 𝑦 (𝑗) • ො 𝑛 σ 𝑗=1 𝜈 = 𝑛 (𝑦 𝑗 − ො 1 𝜏 2 = • ෢ 𝜈) 2 𝑛 σ 𝑗=1

  12. Density estimation • Dataset 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑛) • Each example 𝑦 ∈ 𝑆 𝑜 2 ⋯ 𝑞 𝑦 𝑜 ; 𝜈 𝑜 , 𝜏 𝑜 2 ) 𝑞 𝑦 2 ; 𝜈 2 , 𝜏 2 2 • 𝑞 𝑦 = 𝑞(𝑦 1 ; 𝜈 1 , 𝜏 1 2 ) = Π 𝑘 𝑞(𝑦 𝑘 ; 𝜈 𝑘 , 𝜏 𝑘

  13. Anomaly detection algorithm 1. Choose features 𝑦 𝑗 that you think might be indicative of anomalous examples 2 , 𝜏 2 2 , ⋯ , 𝜏 𝑜 2 2. Fit parameters 𝜈 1 , 𝜈 2, ⋯ , 𝜈 𝑜 , 𝜏 1 1 𝑛 𝑦 𝑘 (𝑗) • 𝜈 𝑘 = 𝑛 σ 𝑗=1 𝑗 − 𝜈 𝑘 ) 2 2 = 𝑛 (𝑦 𝑘 1 • 𝜏 𝑛 σ 𝑗=1 𝑘 3. Given new example 𝑦 , compute 𝑞 𝑦 2 ) 𝑞 𝑦 = Π 𝑘 𝑞(𝑦 𝑘 ; 𝜈 𝑘 , 𝜏 𝑘 Anomaly if 𝑞 𝑦 < 𝜗

  14. Evaluation • Assume we have some labeled data, of anomalous and non- anomalous examples ( 𝑧 = 0 if normal, 𝑧 = 1 if anomalous) • Training set 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑛) (assume normal examples) (1) , 𝑧 𝑑𝑤 (1) ), (𝑦 𝑑𝑤 (2) , 𝑧 𝑑𝑤 (2) ), ⋯ , (𝑦 𝑑𝑤 (𝑛 𝑑𝑤 ) , 𝑧 𝑑𝑤 (𝑛 𝑑𝑤 ) ) • Cross-validation set: (𝑦 𝑑𝑤 (1) , 𝑧 𝑢𝑓𝑡𝑢 (1) ), (𝑦 𝑢𝑓𝑡𝑢 (2) , 𝑧 𝑢𝑓𝑡𝑢 (2) ), ⋯ , (𝑦 𝑢𝑓𝑡𝑢 (𝑛 𝑢𝑓𝑡𝑢 ) , 𝑧 𝑢𝑓𝑡𝑢 (𝑛 𝑢𝑓𝑡𝑢 ) ) • Test set: (𝑦 𝑢𝑓𝑡𝑢

  15. Aircraft engines motivating example • 10000 good (normal) engines • 20 flawed engines (anomalous) • Training set: 6000 good engines • CV: 2000 good engines (y = 0), 10 anomalous (y = 1) • Test: 2000 good engines (y = 0), 10 anomalous (y = 1)

  16. Algorithm evaluation • Fit model 𝑞(𝑦) on training set {𝑦 1 , ⋯ , 𝑦 𝑛 } • On a cross-validation/test example 𝑦 , predict • 𝑧 = ቊ1 if 𝑞 𝑦 < 𝜗 (anomaly) 0 if 𝑞 𝑦 ≥ 𝜗 (normal) • Possible evaluation metrics: • True positive, false positive, false negative, true negative • Precision/Recall • F1-score • Can use cross-validation set to choose parameter 𝜗

  17. Evaluation metric • How about accuracy? • Assume only 0.1% of the engines are anomaly (skewed classes) • Declare every example as normal -> 99.9% accuracy!

  18. Precision/Recall 𝑄𝑆 • F1 score: 2 𝑄+𝑆

  19. Anomaly Detection • Motivation • Developing an anomaly detection system • Anomaly detection vs. supervised learning • Choosing what features to use • Multivariate Gaussian distribution

  20. Anomaly detection Supervised learning • Very small number of positive Large number of positive examples (y=1) (0-20 is common) and negative examples • Large number of negative (y=0) examples Enough positive • Many different types of anomalies. examples for algorithm Hard for any algorithm to learn from to get a sense of what positive examples what the positive are like, future anomalies look like positive examples likely to be similar to ones in • Future anomalies may look nothing training set. like any of the anomalous examples we have seen so far

  21. Anomaly detection Supervised learning • Fraud detection • Email spam classification • Manufacturing • Weather prediction • Monitoring machines in a data • Cancer classification center

  22. Anomaly Detection • Motivation • Developing an anomaly detection system • Anomaly detection vs. supervised learning • Choosing what features to use • Multivariate Gaussian distribution

  23. Non-Gaussian features log 𝑦

  24. Error analysis for anomaly detection Want 𝑞(𝑦) large for normal examples 𝑦 𝑞(𝑦) small for anomalous examples 𝑦 Most common problem: 𝑞(𝑦) is comparable (say both large) for normal and anomalous examples

  25. Monitoring computers in a data center • Choose features that might take on unusually large or small values in the event of an anomaly • 𝑦 1 = memory use of computer • 𝑦 2 = number of dis accesses/sec • 𝑦 3 = CPU load • 𝑦 4 = network traffic CPU load 𝑦 5 = CPU load^2 • 𝑦 5 = network traffic network traffic

  26. Anomaly Detection • Motivation • Developing an anomaly detection system • Anomaly detection vs. supervised learning • Choosing what features to use • Multivariate Gaussian distribution

  27. Motivating example: Monitoring machines in a data center 𝑦 2 (Memory use) 𝑦 1 (CPU load) 𝑦 2 (Memory use) 𝑦 1 (CPU load)

  28. Multivariate Gaussian (normal) distribution • 𝑦 ∈ 𝑆 𝑜 . Don’t model 𝑞 𝑦 1 , 𝑞 𝑦 2 , ⋯ separately • Model 𝑞 𝑦 all in one go. • Parameters : 𝜈 ∈ 𝑆 𝑜 , Σ ∈ 𝑆 𝑜×𝑜 (covariance matrix) 1 2𝜌 𝑜/2 Σ 1/2 exp − 𝑦 − 𝜈 ⊤ Σ −1 (𝑦 − 𝜈) • 𝑞 𝑦; 𝜈, Σ =

  29. Multivariate Gaussian (normal) examples Σ = 1 0 Σ = 0.6 0 Σ = 2 0 0 1 0 0.6 0 2 𝑦 2 𝑦 2 𝑦 2 𝑦 1 𝑦 1 𝑦 1

  30. Multivariate Gaussian (normal) examples Σ = 1 0 Σ = 0.6 0 Σ = 2 0 0 1 0 1 0 1 𝑦 2 𝑦 2 𝑦 2 𝑦 1 𝑦 1 𝑦 1

  31. Multivariate Gaussian (normal) examples Σ = 1 0 1 0.8 1 0.5 Σ = Σ = 0 1 0.8 1 0.5 1 𝑦 2 𝑦 2 𝑦 2 𝑦 1 𝑦 1 𝑦 1

  32. Anomaly detection using the multivariate Gaussian distribution 1. Fit model 𝑞 𝑦 by setting 𝑛 𝜈 = 1 𝑦 (𝑗) 𝑛 ෍ 𝑗=1 𝑛 Σ = 1 (𝑦 (𝑗) −𝜈)(𝑦 (𝑗) − 𝜈) ⊤ 𝑛 ෍ 𝑗=1 2 Give a new example 𝑦 , compute 1 2𝜌 𝑜/2 Σ 1/2 exp − 𝑦 − 𝜈 ⊤ Σ −1 (𝑦 − 𝜈) 𝑞 𝑦; 𝜈, Σ = Flag an anomaly if 𝑞 𝑦 < 𝜗

  33. Original model Original model 2 𝑞 𝑦 2 ; 𝜈 2 , 𝜏 2 2 ⋯ 𝑞 𝑦 𝑜 ; 𝜈 𝑜 , 𝜏 𝑜 2 𝑞 𝑦; 𝜈, Σ 𝑞 𝑦 1 ; 𝜈 1 , 𝜏 1 1 2𝜌 𝑜/2 Σ 1/2 exp(− 𝑦 − 𝜈 ⊤ Σ −1 (𝑦 = Manually create features to capture anomalies where 𝑦 1 , 𝑦 2 take unusual combinations of values Computationally cheaper (alternatively, scales better) OK even if training set size is small

  34. Things to remember • Motivation • Developing an anomaly detection system • Anomaly detection vs. supervised learning • Choosing what features to use • Multivariate Gaussian distribution

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend