Automatic Outlier Detection: A Bayesian Approach Jo-Anne Ting, - PowerPoint PPT Presentation

Automatic Outlier Detection: A Bayesian Approach Jo-Anne Ting, University of Southern California Aaron D’Souza, Google, Inc. Stefan Schaal, University of Southern California ICRA 2007 April 12, 2007

Outline • Motivation • Past & related work • Bayesian regression for automatic outlier detection – Batch version – Incremental version • Results – Synthetic data – Robotic data • Conclusions J. Ting 2

Motivation • Real-world sensor data is susceptible to outliers – E.g., motion capture (MOCAP) data of a robotic dog J. Ting 3

Past & Related Work • Current methods for outlier detection may: – Require parameter tuning (i.e. an optimal threshold) – Require sampling (e.g. active sampling, Abe et al., 2006) or the setting of certain parameters, e.g., k in k-means clustering (MacQueen, 1967) – Assume an underlying data structure (e.g. mixture models, Fox et al., 1999) – Adopt a weighted linear regression model, but model the weights with some heuristic function (e.g., robust least squares, Hoaglin, 1983) J. Ting 5

Bayesian Regression for Automatic Outlier Detection • Consider linear regression: y i = b T x i + � y i • We can modify the above to get a weighted linear regression model (Gelman et al., 1995): y i ~ Normal b T x i , � 2 � � � � w i � � ( ) b ~ Normal b 0 , � 2 � b 0 Except now: ( ) w i ~ Gamma a w i , b w i J. Ting 7

Bayesian Regression for Automatic Outlier Detection • This Bayesian treatment of weighted linear regression: • Is suitable for real-time outlier detection • Requires no model assumptions • Requires no parameter tuning J. Ting 8

Bayesian Regression for Automatic Outlier Detection • Our goal is to infer the posterior distributions of b and w • We can treat this as an EM problem (Dempster et al., 1977) and maximize the incomplete log likelihood: log p ( y | X ) by maximizing the expected complete log likelihood: E [log p ( y , b , w | X )] J. Ting 9

Bayesian Regression for Automatic Outlier Detection • In the E-step, we need to calculate: E Q ( b , w ) [log p ( y , b , w | X )] but since the true posterior over all hidden variables is analytically intractable, we make a factorial variational approximation (Hinton & van Camp 1993, Ghahramani & Beal, 2000): Q ( b , w ) = Q ( b ) Q ( w ) J. Ting 10

Bayesian Regression for Automatic Outlier Detection Reminder: • EM Update Equations (batch version): y i ~ Normal b T x i , � 2 � � � � � � w i E - step : � 1 � � � 1 + N � � b = � b 0 T � � w i x i x i If prediction error is � � Point is i = 1 very large, E[w i ] goes downweighted � � N to 0 � b = � b � b 0 � 1 b 0 + � � w i y i x i � � i = 1 a w i ,0 + 0.5 w i = ( ) 1 + 1 T x i 2 b w i ,0 + 2 y i � b T � b x i x i 2 � 2 M - step : ( ) + w i x i N 2 = 1 � T x i � � � y i � b T � b x i w i � � N i = 1 J. Ting 11

Bayesian Regression for Automatic Outlier Detection • EM Update Equations (incremental version): E - step : Sufficient statistics � 1 � � N � 1 + are exponentially � � b = � b 0 N k = 1 + � N k � 1 � k wxx T � � discounted by λ , � � wxx T = w k x k x k i = 1 T + � � k � 1 0 ≤ λ ≤ 1 (e.g., Ljung � k wxx T � � N � b = � b � b 0 � 1 b 0 + � k & Soderstrom, 1983) wyx wyx = w k y k x k + � � k � 1 � � � k � � wyx i = 1 wy 2 = w k y k 2 + � � k � 1 a w i ,0 + 0.5 � k wy 2 w i = ( ) 1 + 1 T x i 2 b w i ,0 + 2 � 2 y i � b T � b x i 2 x i M - step : { } wy 2 � 2 � k wxx T � b T � k N � 2 = 1 wyx + b wxx T b + 1 T diag � k � � � � k � � N k i = 1 J. Ting 12

Results: Synthetic Data • Given noisy data (+outliers) from a linear regression problem: • 5 input dimensions • 1000 samples • SNR = 10 • 20% outliers • outliers are 3 σ from output mean J. Ting 14

Results: Synthetic Data Available in Batch Form Average Normalized Mean Squared Prediction Error as a Function of How Far Outliers are from Inliers Lowest prediction error Distance of outliers from mean is at least… +3 σ +2 σ + σ Algorithm Thresholding (optimally tuned) 0.0903 0.0503 0.0232 Mixture model 0.1327 0.0688 0.0286 Robust Least Squares 0.1890 0.1518 0.0880 Robust Regression (Faul & 0.1320 0.0683 0.0282 Tipping 2001) Bayesian weighted regression 0.0273 0.0270 0.0210 Data: Globally linear data with 5 input dimensions evaluated in batch form, averaged over 10 trials (SNR = 10, σ is the standard deviation of the true conditional output mean) J. Ting 15

Results: Synthetic Data Available Incrementally Prediction Error Over Time with Outliers at least 2 σ away ( λ =0.999) Lowest prediction error J. Ting 16

Results: Synthetic Data Available Incrementally Prediction Error Over Time with Outliers at least 3 σ away ( λ =0.999) Lowest prediction error J. Ting 17

Results: Robotic Orientation Data • Offset between MOCAP data & IMU data for LittleDog: J. Ting 18

Results: Predicted Output on LittleDog MOCAP Data J. Ting 19

Conclusions • We have an algorithm that: – Automatically detects outliers in real-time – Requires no user interference, parameter tuning or sampling – Performs on par with and even exceeds standard outlier detection methods • Extensions to the Kalman filter and other filters J. Ting 21

Automatic Outlier Detection: A Bayesian Approach Jo-Anne Ting, - PowerPoint PPT Presentation

Automatic Outlier Detection: A Bayesian Approach Jo-Anne Ting, University of Southern California Aaron DSouza, Google, Inc. Stefan Schaal, University of Southern California ICRA 2007 April 12, 2007 Outline Motivation Past &

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Outlier Detection Outlier detection is both easy and difficult. It is easy since there are

Proximity-based Outlier Detection Objects far away from the others are outliers The

Outlier Detection Motivation: Fraud Detection http://i.imgur.com/ckkoAOp.gif Jian Pei: CMPT

Shape Outlier Detection Using Pose Preserving Dynamic Shape Models Chan-Su Lee and Ahmed

Outlier Detection Chapter 12 of Data Mining: Concepts and Techniques JIAWEI HAN, MICHELINE KAMBER,

Good and Bad Neighborhood Approximations for Outlier Detection Ensembles Evelyn Kirner, Erich

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Background Data Resampling for Outlier-Aware Classification Out-of-distribution Detection

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Fast Bayesian automatic Fast Bayesian automatic adaptive quadrature adaptive quadrature Gh.

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Anomaly Based Network Intrusion Detection with Unsupervised Outlier Detection Jiong Zhang and

Budget Kickoff February 27, 2018 1 Governor's Finance Office BUDGET KICKOFF 2019 2021

Multilevel optimization by space-filling curves in adaptive atmospheric modeling Jrn Behrens

Tutorial Overview Voting Rules Such as: Plurality, Borda, Approval, Copleand . . .

Smartphone Chemistry : Tin (Sn) Jason Park [Advanced Chemistry] Characteristics of Tin

SBND Detector Ting Miao (FNAL) Neutrino - Latin America Workshop April 27, 2016 SBND - S hort- B

Improving Twitter Retrieval by Exploiting Structural Information Zhunchen Luo, Miles

Finding and Understanding Bugs in Software Model Checkers Chengyu Zhang , Ting Su, Yichen Yan,

MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree Based KV Stores with a Matrix

Sambuz

Useful Links

Newsletter

Mail Us