Automatic Outlier Detection: A Bayesian Approach Jo-Anne Ting, - - PowerPoint PPT Presentation
Automatic Outlier Detection: A Bayesian Approach Jo-Anne Ting, - - PowerPoint PPT Presentation
Automatic Outlier Detection: A Bayesian Approach Jo-Anne Ting, University of Southern California Aaron DSouza, Google, Inc. Stefan Schaal, University of Southern California ICRA 2007 April 12, 2007 Outline Motivation Past &
SLIDE 1
SLIDE 2
- J. Ting
2
Outline
- Motivation
- Past & related work
- Bayesian regression for automatic outlier detection
– Batch version – Incremental version
- Results
– Synthetic data – Robotic data
- Conclusions
SLIDE 3
- J. Ting
3
Motivation
- Real-world sensor data is susceptible to outliers
– E.g., motion capture (MOCAP) data of a robotic dog
SLIDE 4
- J. Ting
4
Outline
- Motivation
- Past & related work
- Bayesian regression for automatic outlier detection
– Batch version – Incremental version
- Results
– Synthetic data – Robotic data
- Conclusions
SLIDE 5
- J. Ting
5
Past & Related Work
- Current methods for outlier detection may:
– Require parameter tuning (i.e. an optimal threshold) – Require sampling (e.g. active sampling, Abe et al., 2006) or the setting of certain parameters, e.g., k in k-means clustering (MacQueen, 1967) – Assume an underlying data structure (e.g. mixture models, Fox et al., 1999) – Adopt a weighted linear regression model, but model the weights with some heuristic function (e.g., robust least squares, Hoaglin, 1983)
SLIDE 6
- J. Ting
6
Outline
- Motivation
- Past & related work
- Bayesian regression for automatic outlier detection
– Batch version – Incremental version
- Results
– Synthetic data – Robotic data
- Conclusions
SLIDE 7
- J. Ting
7
Bayesian Regression for Automatic Outlier Detection
- Consider linear regression:
- We can modify the above to get a weighted linear
regression model (Gelman et al., 1995): yi = bTxi +yi yi ~ Normal bTxi, 2
wi
- b ~ Normal b0, 2b0
( )
Except now: wi ~ Gamma awi,bwi ( )
SLIDE 8
- J. Ting
8
Bayesian Regression for Automatic Outlier Detection
- This Bayesian treatment of weighted linear regression:
- Is suitable for real-time outlier detection
- Requires no model assumptions
- Requires no parameter tuning
SLIDE 9
- J. Ting
9
Bayesian Regression for Automatic Outlier Detection
- Our goal is to infer the posterior distributions of b and w
- We can treat this as an EM problem (Dempster et al.,
1977) and maximize the incomplete log likelihood: by maximizing the expected complete log likelihood:
log p(y|X)
E[log p(y,b,w|X)]
SLIDE 10
- J. Ting
10
Bayesian Regression for Automatic Outlier Detection
- In the E-step, we need to calculate:
but since the true posterior over all hidden variables is analytically intractable, we make a factorial variational approximation (Hinton & van Camp 1993, Ghahramani & Beal, 2000):
EQ(b,w)[log p(y,b,w|X)] Q(b,w)=Q(b)Q(w)
SLIDE 11
- J. Ting
11
Bayesian Regression for Automatic Outlier Detection
- EM Update Equations (batch version):
E - step : b = b0
1 +
wi xixi
T i=1 N
- 1
b = b b0
1b0 +
wi yixi
i=1 N
- wi =
awi ,0 + 0.5 bwi ,0 + 1 2
2 yi b T xi
( )
2
+ 1 2 xi
Tbxi
M - step :
- 2 = 1
N wi yi b
T xi
( ) + wi xi
Tbxi
- i=1
N
- If prediction error is
very large, E[wi] goes to 0 Point is downweighted
Reminder:
yi ~ Normal bTxi , 2
wi
SLIDE 12
- J. Ting
12
Bayesian Regression for Automatic Outlier Detection
- EM Update Equations (incremental version):
E - step :
b = b0
1 +
k
wxxT i=1 N
- 1
b = b b0
1b0 +
k
wyx i=1 N
- wi =
awi ,0 + 0.5 bwi ,0 + 1 2 2 yi b
T xi
( )
2
+ 1 2 xi
Tbxi
M - step : 2 = 1 N k k
wy2 2k wyx + b T k wxxT b +1T diag k wxxT b
{ }
- i=1
N
- Sufficient statistics
are exponentially discounted by λ, 0 ≤ λ ≤ 1 (e.g., Ljung & Soderstrom, 1983)
Nk = 1+ Nk1 k
wxxT = wk xkxk T + k1 wxxT
k
wyx = wk ykxk + k1 wyx
k
wy2 = wk yk 2 + k1 wy2
SLIDE 13
- J. Ting
13
Outline
- Motivation
- Past & related work
- Bayesian regression for automatic outlier detection
– Batch version – Incremental version
- Results
– Synthetic data – Robotic data
- Conclusions
SLIDE 14
- J. Ting
14
Results: Synthetic Data
- Given noisy data (+outliers) from a linear regression problem:
- 5 input dimensions
- 1000 samples
- SNR = 10
- 20% outliers
- outliers are 3σ from
- utput mean
SLIDE 15
- J. Ting
15
Results: Synthetic Data Available in Batch Form
Average Normalized Mean Squared Prediction Error as a Function of How Far Outliers are from Inliers
0.0286 0.0688 0.1327 Mixture model 0.0880 0.1518 0.1890 Robust Least Squares 0.0282 0.0683 0.1320 Robust Regression (Faul & Tipping 2001) 0.0210 0.0270 0.0273 Bayesian weighted regression 0.0232 0.0503 0.0903 Thresholding (optimally tuned) +σ +2σ +3σ Algorithm Distance of outliers from mean is at least…
Data: Globally linear data with 5 input dimensions evaluated in batch form, averaged over 10 trials (SNR = 10, σ is the standard deviation of the true conditional output mean)
Lowest prediction error
SLIDE 16
- J. Ting
16
Results: Synthetic Data Available Incrementally
Prediction Error Over Time with Outliers at least 2σ away (λ=0.999) Lowest prediction error
SLIDE 17
- J. Ting
17
Results: Synthetic Data Available Incrementally
Lowest prediction error Prediction Error Over Time with Outliers at least 3σ away (λ=0.999)
SLIDE 18
- J. Ting
18
Results: Robotic Orientation Data
- Offset between MOCAP data & IMU data
for LittleDog:
SLIDE 19
- J. Ting
19
Results: Predicted Output on LittleDog MOCAP Data
SLIDE 20
- J. Ting
20
Outline
- Motivation
- Past & related work
- Bayesian regression for automatic outlier detection
– Batch version – Incremental version
- Results
– Synthetic data – Robotic data
- Conclusions
SLIDE 21
- J. Ting
21
Conclusions
- We have an algorithm that:
– Automatically detects outliers in real-time – Requires no user interference, parameter tuning or sampling – Performs on par with and even exceeds standard outlier detection methods
- Extensions to the Kalman filter and other filters