Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Administrative

Anomaly Detection • Motivation • Developing an anomaly detection system • Anomaly detection vs. supervised learning • Choosing what features to use • Multivariate Gaussian distribution

Anomaly detection example • Dataset 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑛) • New engine: 𝑦 text 𝑦 2 vibration • Aircraft engine features: • 𝑦 1 = heat generated • 𝑦 2 = vibration intensity 𝑦 1 heat

Density estimation • Dataset 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑛) • Is 𝑦 text anomalous? 𝑦 2 Model 𝒒 𝒚 vibration • 𝑞 𝑦 text < 𝜗 → flag anomaly • 𝑞 𝑦 text ≥ 𝜗 → OK 𝑦 1 heat

Anomaly detection example • Fraud detection • 𝑦 (𝑗) = features of user I’s activities • Model 𝑞 𝑦 from data • Identify unusual users by checking which have 𝑞 𝑦 < 𝜗 • Manufacturing • Monitoring computers in a data center • 𝑦 (𝑗) = features of machine i • 𝑦 1 = memory use, 𝑦 2 = number of disk accesses/sec • 𝑦 3 = CPU load, 𝑦 4 = CPU load/network traffic

Gaussian (normal) distribution • Say 𝑦 ∈ 𝑆 . If 𝑦 is a distributed Gaussian with mean 𝜈 , variance 𝜏 2 . • 𝑦 ∼ 𝑂(𝜈, 𝜏 2 ) 𝜏 standard deviation 𝑦−𝜈 2 • 𝑞 𝑦; 𝜈, 𝜏 2 = 1 2𝜌𝜏 exp − 2𝜏 2

Gaussian distribution examples

Parameter estimation • Dataset 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑛) • 𝑦 ∼ 𝑂(𝜈, 𝜏 2 ) • Maximum likelihood estimation 1 𝑛 𝑦 (𝑗) • ො 𝑛 σ 𝑗=1 𝜈 = 𝑛 (𝑦 𝑗 − ො 1 𝜏 2 = • ෢ 𝜈) 2 𝑛 σ 𝑗=1

Density estimation • Dataset 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑛) • Each example 𝑦 ∈ 𝑆 𝑜 2 ⋯ 𝑞 𝑦 𝑜 ; 𝜈 𝑜 , 𝜏 𝑜 2 ) 𝑞 𝑦 2 ; 𝜈 2 , 𝜏 2 2 • 𝑞 𝑦 = 𝑞(𝑦 1 ; 𝜈 1 , 𝜏 1 2 ) = Π 𝑘 𝑞(𝑦 𝑘 ; 𝜈 𝑘 , 𝜏 𝑘

Anomaly detection algorithm 1. Choose features 𝑦 𝑗 that you think might be indicative of anomalous examples 2 , 𝜏 2 2 , ⋯ , 𝜏 𝑜 2 2. Fit parameters 𝜈 1 , 𝜈 2, ⋯ , 𝜈 𝑜 , 𝜏 1 1 𝑛 𝑦 𝑘 (𝑗) • 𝜈 𝑘 = 𝑛 σ 𝑗=1 𝑗 − 𝜈 𝑘 ) 2 2 = 𝑛 (𝑦 𝑘 1 • 𝜏 𝑛 σ 𝑗=1 𝑘 3. Given new example 𝑦 , compute 𝑞 𝑦 2 ) 𝑞 𝑦 = Π 𝑘 𝑞(𝑦 𝑘 ; 𝜈 𝑘 , 𝜏 𝑘 Anomaly if 𝑞 𝑦 < 𝜗

Evaluation • Assume we have some labeled data, of anomalous and non- anomalous examples ( 𝑧 = 0 if normal, 𝑧 = 1 if anomalous) • Training set 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑛) (assume normal examples) (1) , 𝑧 𝑑𝑤 (1) ), (𝑦 𝑑𝑤 (2) , 𝑧 𝑑𝑤 (2) ), ⋯ , (𝑦 𝑑𝑤 (𝑛 𝑑𝑤 ) , 𝑧 𝑑𝑤 (𝑛 𝑑𝑤 ) ) • Cross-validation set: (𝑦 𝑑𝑤 (1) , 𝑧 𝑢𝑓𝑡𝑢 (1) ), (𝑦 𝑢𝑓𝑡𝑢 (2) , 𝑧 𝑢𝑓𝑡𝑢 (2) ), ⋯ , (𝑦 𝑢𝑓𝑡𝑢 (𝑛 𝑢𝑓𝑡𝑢 ) , 𝑧 𝑢𝑓𝑡𝑢 (𝑛 𝑢𝑓𝑡𝑢 ) ) • Test set: (𝑦 𝑢𝑓𝑡𝑢

Aircraft engines motivating example • 10000 good (normal) engines • 20 flawed engines (anomalous) • Training set: 6000 good engines • CV: 2000 good engines (y = 0), 10 anomalous (y = 1) • Test: 2000 good engines (y = 0), 10 anomalous (y = 1)

Algorithm evaluation • Fit model 𝑞(𝑦) on training set {𝑦 1 , ⋯ , 𝑦 𝑛 } • On a cross-validation/test example 𝑦 , predict • 𝑧 = ቊ1 if 𝑞 𝑦 < 𝜗 (anomaly) 0 if 𝑞 𝑦 ≥ 𝜗 (normal) • Possible evaluation metrics: • True positive, false positive, false negative, true negative • Precision/Recall • F1-score • Can use cross-validation set to choose parameter 𝜗

Evaluation metric • How about accuracy? • Assume only 0.1% of the engines are anomaly (skewed classes) • Declare every example as normal -> 99.9% accuracy!

Precision/Recall 𝑄𝑆 • F1 score: 2 𝑄+𝑆

Anomaly detection Supervised learning • Very small number of positive Large number of positive examples (y=1) (0-20 is common) and negative examples • Large number of negative (y=0) examples Enough positive • Many different types of anomalies. examples for algorithm Hard for any algorithm to learn from to get a sense of what positive examples what the positive are like, future anomalies look like positive examples likely to be similar to ones in • Future anomalies may look nothing training set. like any of the anomalous examples we have seen so far

Anomaly detection Supervised learning • Fraud detection • Email spam classification • Manufacturing • Weather prediction • Monitoring machines in a data • Cancer classification center

Non-Gaussian features log 𝑦

Error analysis for anomaly detection Want 𝑞(𝑦) large for normal examples 𝑦 𝑞(𝑦) small for anomalous examples 𝑦 Most common problem: 𝑞(𝑦) is comparable (say both large) for normal and anomalous examples

Monitoring computers in a data center • Choose features that might take on unusually large or small values in the event of an anomaly • 𝑦 1 = memory use of computer • 𝑦 2 = number of dis accesses/sec • 𝑦 3 = CPU load • 𝑦 4 = network traffic CPU load 𝑦 5 = CPU load^2 • 𝑦 5 = network traffic network traffic

Motivating example: Monitoring machines in a data center 𝑦 2 (Memory use) 𝑦 1 (CPU load) 𝑦 2 (Memory use) 𝑦 1 (CPU load)

Multivariate Gaussian (normal) distribution • 𝑦 ∈ 𝑆 𝑜 . Don’t model 𝑞 𝑦 1 , 𝑞 𝑦 2 , ⋯ separately • Model 𝑞 𝑦 all in one go. • Parameters : 𝜈 ∈ 𝑆 𝑜 , Σ ∈ 𝑆 𝑜×𝑜 (covariance matrix) 1 2𝜌 𝑜/2 Σ 1/2 exp − 𝑦 − 𝜈 ⊤ Σ −1 (𝑦 − 𝜈) • 𝑞 𝑦; 𝜈, Σ =

Multivariate Gaussian (normal) examples Σ = 1 0 Σ = 0.6 0 Σ = 2 0 0 1 0 0.6 0 2 𝑦 2 𝑦 2 𝑦 2 𝑦 1 𝑦 1 𝑦 1

Multivariate Gaussian (normal) examples Σ = 1 0 Σ = 0.6 0 Σ = 2 0 0 1 0 1 0 1 𝑦 2 𝑦 2 𝑦 2 𝑦 1 𝑦 1 𝑦 1

Multivariate Gaussian (normal) examples Σ = 1 0 1 0.8 1 0.5 Σ = Σ = 0 1 0.8 1 0.5 1 𝑦 2 𝑦 2 𝑦 2 𝑦 1 𝑦 1 𝑦 1

Anomaly detection using the multivariate Gaussian distribution 1. Fit model 𝑞 𝑦 by setting 𝑛 𝜈 = 1 𝑦 (𝑗) 𝑛 ෍ 𝑗=1 𝑛 Σ = 1 (𝑦 (𝑗) −𝜈)(𝑦 (𝑗) − 𝜈) ⊤ 𝑛 ෍ 𝑗=1 2 Give a new example 𝑦 , compute 1 2𝜌 𝑜/2 Σ 1/2 exp − 𝑦 − 𝜈 ⊤ Σ −1 (𝑦 − 𝜈) 𝑞 𝑦; 𝜈, Σ = Flag an anomaly if 𝑞 𝑦 < 𝜗

Original model Original model 2 𝑞 𝑦 2 ; 𝜈 2 , 𝜏 2 2 ⋯ 𝑞 𝑦 𝑜 ; 𝜈 𝑜 , 𝜏 𝑜 2 𝑞 𝑦; 𝜈, Σ 𝑞 𝑦 1 ; 𝜈 1 , 𝜏 1 1 2𝜌 𝑜/2 Σ 1/2 exp(− 𝑦 − 𝜈 ⊤ Σ −1 (𝑦 = Manually create features to capture anomalies where 𝑦 1 , 𝑦 2 take unusual combinations of values Computationally cheaper (alternatively, scales better) OK even if training set size is small

Things to remember • Motivation • Developing an anomaly detection system • Anomaly detection vs. supervised learning • Choosing what features to use • Multivariate Gaussian distribution

Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative Anomaly Detection Motivation Developing an anomaly detection system Anomaly detection vs. supervised learning Choosing what features

What is an anomaly? Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Defining

Isolation trees Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Isolation

Anomaly Detection of Trajectories Junier B. Oliva Anomaly Detection An anomaly (or outlier)

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Learning Rules for Anomaly Detection (LERAD) of Hostile Network Traffic Matt Mahoney Overview

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Structure of Talk Workload-sensitive Timing Behavior Anomaly Detection 1 Motivation in Large

Dataflow Anomaly Detection Presented By Archana Viswanath Computer Science and Engineering The

<Title> Yiqun Hu, SP Group Agenda Condition monitoring & anomaly detection

In Incorporating Feedback in into Tree-based Anomaly Detection Shubhomoy Das, Weng-Keen Wong,

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Netw ork I ntrusion Detection System s False Positive Reduction Through Anomaly Detection Joint

Anomaly Based Network Intrusion Detection with Unsupervised Outlier Detection Jiong Zhang and

Tra ffi c anomaly detection using a distributed measurement network Razvan Oprea Supervisor:

Anomaly Detection on User-agents Peter van Bolhuis Overview Introduction Research

Anomaly Detection for the CERN Large Hadron Collider injection magnets Armin Halilovic KU

Bayesian Anomaly Detection (BAD v0.1) Tim Menzies tim@menzies.us Lane Department of CS & EE,

Data Mining for Anomaly Detection Aleksandar Lazarevic United Technologies Research Center

ADS: ADS: Ra Rapid De Deployment o of f Anomaly Detect ction Models Jiahao Bu Tsinghua

Inter-Arrival Curves for Multi-Mode and Online Anomaly Detection Mahmoud Salem, Mark Crowley,

Why Nobody Cares About Your Anomaly Detection Baron Schwartz - November 2017 @xaprb

Driving Anomaly Detection with Conditional GAN using Physiological Data & CAN-Bus Data Yuning

15-388/688 - Practical Data Science: Anomaly detection and mixture of Gaussians J. Zico Kolter

Topics in Software Dynamic White-box Testing Part 2: Data-flow Testing [Reading assignment:

Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative Anomaly Detection Motivation Developing an anomaly detection system Anomaly detection vs. supervised learning Choosing what features

What is an anomaly? Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Defining

Isolation trees Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Isolation

Anomaly Detection of Trajectories Junier B. Oliva Anomaly Detection An anomaly (or outlier)

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Learning Rules for Anomaly Detection (LERAD) of Hostile Network Traffic Matt Mahoney Overview

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Structure of Talk Workload-sensitive Timing Behavior Anomaly Detection 1 Motivation in Large

Dataflow Anomaly Detection Presented By Archana Viswanath Computer Science and Engineering The

&lt;Title&gt; Yiqun Hu, SP Group Agenda Condition monitoring &amp; anomaly detection

In Incorporating Feedback in into Tree-based Anomaly Detection Shubhomoy Das, Weng-Keen Wong,

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Netw ork I ntrusion Detection System s False Positive Reduction Through Anomaly Detection Joint

Anomaly Based Network Intrusion Detection with Unsupervised Outlier Detection Jiong Zhang and

Tra ffi c anomaly detection using a distributed measurement network Razvan Oprea Supervisor:

Anomaly Detection on User-agents Peter van Bolhuis Overview Introduction Research

Anomaly Detection for the CERN Large Hadron Collider injection magnets Armin Halilovic KU

Bayesian Anomaly Detection (BAD v0.1) Tim Menzies tim@menzies.us Lane Department of CS &amp; EE,

Data Mining for Anomaly Detection Aleksandar Lazarevic United Technologies Research Center

ADS: ADS: Ra Rapid De Deployment o of f Anomaly Detect ction Models Jiahao Bu Tsinghua

Inter-Arrival Curves for Multi-Mode and Online Anomaly Detection Mahmoud Salem, Mark Crowley,

Why Nobody Cares About Your Anomaly Detection Baron Schwartz - November 2017 @xaprb

Driving Anomaly Detection with Conditional GAN using Physiological Data &amp; CAN-Bus Data Yuning

15-388/688 - Practical Data Science: Anomaly detection and mixture of Gaussians J. Zico Kolter

Topics in Software Dynamic White-box Testing Part 2: Data-flow Testing [Reading assignment:

<Title> Yiqun Hu, SP Group Agenda Condition monitoring & anomaly detection

Bayesian Anomaly Detection (BAD v0.1) Tim Menzies tim@menzies.us Lane Department of CS & EE,

Driving Anomaly Detection with Conditional GAN using Physiological Data & CAN-Bus Data Yuning