Statistical Geometry Processing Winter Semester 2011/2012 - - PowerPoint PPT Presentation
Statistical Geometry Processing Winter Semester 2011/2012 - - PowerPoint PPT Presentation
Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting Approximation Common Situation: We have many data points, they might be noisy Example: Scanned data Want to approximate the data with a
Least-Squares Fitting
3
Approximation
Common Situation:
- We have many data points, they might be noisy
- Example: Scanned data
- Want to approximate the data with a smooth curve /
surface
What we need:
- Criterion – what is a good approximation?
- Methods to compute this approximation
4
Approximation Techniques
Agenda:
- Least-squares approximation
(and why/when this makes sense)
- Total least-squares linear approximation
(get rid of the coordinate system)
- Iteratively reweighted least-squares
(for nasty noise distributions)
5
Least-Squares
We assume the following scenario:
- Given: Function values yi at positions xi.
(1D 1D for now)
- Independent variables xi known exactly.
- Dependent variables yi with some error.
- Error Gaussian, i.i.d.
- normal distributed
- independent
- same distribution at every point
- We know the class of functions
6
Situation
Situation:
- Original sample points taken at xi from original f.
- Unknown Gaussian iid noise added to each yi.
- Want to estimated reconstructed f.
~
x1 x2 xn y1 y2 yn
7
Summary
Statistical model yields least-squares criterion: Linear function space leads to quadratic objective: Critical point: linear system
n i i i f
y x f
1 2 ~
) ) ( ~ ( min arg
k j j j
x b x f
1
: ~
n i i k j i j j
y x b
1 2 1
) ( min arg
λ
k k k k k k
b y b y b b b b b b b b , , , , , ,
1 1 1 1 1 1
n t t t i i n t t j t i j i
y x b x b x b
1 1
) ( : , ) ( ) ( : , b y b b
with:
8
Maximum Likelihood Estimation
Goal:
- Maximize the probability that the data originated from
the reconstructed curve f.
- “Maximum likelihood estimation”
~ Gaussian normal distribution
2 2 ,
2 exp π 2 1 ) (
x x p
9
Maximum Likelihood Estimation
n i i i f n i i i f n i i i f n i i i f n i i i f n i i i f
y x f y x f y x f y x f y x f y x f N
1 2 ~ 1 2 2 ~ 1 2 2 ~ 1 2 2 ~ 1 2 2 ~ 1 , ~
) ) ( ~ ( min arg 2 ) ) ( ~ ( min arg 2 ) ) ( ~ ( π 2 1 ln max arg 2 ) ) ( ~ ( exp π 2 1 ln max arg 2 ) ) ( ~ ( exp π 2 1 max arg ) ) ( ~ ( max arg
10
Maximum Likelihood Estimation
n i i i f n i i i f n i i i f n i i i f n i i i f n i i i f
y x f y x f y x f y x f y x f y x f N
1 2 ~ 1 2 2 ~ 1 2 2 ~ 1 2 2 ~ 1 2 2 ~ 1 , ~
) ) ( ~ ( min arg 2 ) ) ( ~ ( min arg 2 ) ) ( ~ ( π 2 1 ln max arg 2 ) ) ( ~ ( exp π 2 1 ln max arg 2 ) ) ( ~ ( exp π 2 1 max arg ) ) ( ~ ( max arg
11
Least-Squares Approximation
This shows:
- Maximum likelihood estimate minimizes
sum of squared errors
Next: Compute optimal coefficients
- Linear ansatz:
- Determine optimal i
k j j j
x b x f
1
: ~
12
entries 1 entries, 1 entries 1 entries 1
: ) ( ) ( : , ) ( ) ( : ) ( , :
n n n n i i i k k k k
y y x b x b x b x b x y b b λ
Maximum Likelihood Estimation
n i i n i i i n i i i n i i i n i i k j i j j n i i i
y x y x x y x y x b y x f
1 2 1 T 1 T T 1 2 T 1 2 1 1 2
) ( 2 ) ( ) ( min arg ) ( min arg ) ( min arg ) ) ( ~ ( min arg b λ λ b b λ b λ
λ λ λ λ
xTAx bx c Quadratic optimization problem
13
Critical Point
k n i i i n i i n i i i n i i i
x x y x y x x b y b y λ b b b λ λ b b λ
λ T 1 T 1 T 1 2 1 T 1 T T
2 ) ( ) ( 2 ) ( 2 ) ( ) (
We obtain a linear system of equations:
k n i i i
x x b y b y λ b b
T 1 T 1 T
) ( ) (
entries 1 entries, 1 entries 1 entries 1
: ) ( ) ( : , ) ( ) ( : ) ( , :
n n n n i i i k k k k
y y x b x b x b x b x y b b λ
14
Critical Point
This can also be written as: with:
k k k k k k
b y b y b b b b b b b b , , , , , ,
1 1 1 1 1 1
n t t t i i n t t j t i j i
y x b x b x b
1 1
) ( : , ) ( ) ( : , b y b b
15
Summary (again)
Statistical model yields least-squares criterion: Linear function space leads to quadratic objective: Critical point: linear system
n i i i f
y x f
1 2 ~
) ) ( ~ ( min arg
k j j j
x b x f
1
: ~
n i i k j i j j
y x b
1 2 1
) ( min arg
λ
with:
k k k k k k
b y b y b b b b b b b b , , , , , ,
1 1 1 1 1 1
n t t t i i n t t j t i j i
y x b x b x b
1 1
) ( : , ) ( ) ( : , b y b b
16
Variants
Weighted least squares:
- In case the data point’s noise has different standard
deviations at the different data points
- This gives a weighted least squares problem
- Noisier points have smaller influence
17
Same procedure as prev. slides...
n i i i weights i f n i i i i f n i i i i i f n i i i i i f n i i i i i f n i i i f
y x f y x f y x f y x f y x f y x f N
1 2 2 ~ 1 2 2 ~ 1 2 2 ~ 1 2 2 ~ 1 2 2 ~ 1 ~
) ) ( ~ ( 1 min arg 2 ) ) ( ~ ( min arg 2 ) ) ( ~ ( π 2 1 log max arg 2 ) ) ( ~ ( exp π 2 1 log max arg 2 ) ) ( ~ ( exp π 2 1 max arg ) ) ( ~ ( max arg
18
Result
Linear system for the general case: Larger larger influence of data point
n n n n n n
b y b y b b b b b b b b , , , , , ,
1 1 1 1 1 1
n l i i i i i n l i i j i i j i
x y x b x x b x b
1 2 1 2
) ( : , ) ( ) ( : , b y b b
with:
i i i i
x x 1 i.e. , 1
2 2
19
Least-Squares Linear Systems
Remark:
- We get the same result, if we solve an overdetermined
system for the interpolation problem in a least squares sense
- Least-squares solution to linear system:
- “System of normal equations”
b A Ax A b A Ax A b b b Ax Ax A x b Ax b Ax
x x T T T T T T T 2
: i.e. , 2 2 : gradient compute 2 min arg min arg
20
SVD
Problem with normal equations:
- Condition number of normal equations is square of that
- f A itself
- Proof:
- For “evil” (i.e. ill conditioned) problems, normal equations
are not the best way to solve the problem
- In that case, we can use the SVD to solve the problem...
V D V UDV DU V A A UDV A
2 T T T T
: SVD
21
One more Variant...
Function Approximation
- Given the following problem:
- We know a function f: n
- We want to approximate f in
a linear subspace:
- How to choose ?
- Difference: Continuous function as “data” to be matched.
- Solution: Almost the same as before...
k j j j
x b x f
1
: ~ f f ~
22
Function Approximation
Objective function:
- We obtain:
min ~
2
f x f
f f f x b b b b b b b b b f x b f x b f x b
j k j j n n n n k j j j k j j j k j j j
, , 2 , , , , ,
1 1 1 1 1 T 1 1 2 1
λ λ
23
Function Approximation
Critical point (i.e., solution): with:
f x b f x b b b b b b b b b
k k k k k k
, , , , , ,
1 1 1 1 1 1
dx x g x f g f ) ( ) ( ,
dx x x g x f g f ) ( ) ( ) ( ,
2
(unweighted version) (weighted version)
24
Galerkin Approximation
The least-squares criterion is equivalent to:
x b f x b x b k i x b f x b k i
i i k j j j i k j j j
, , : } .. 1 { , : } .. 1 {
1 1
f x b f x b b b b b b b b b
n n n n n n
, , , , , ,
1 1 1 1 1 1
residual each basis function linear subspace residual vector best approx.
- rigin
25
Summary
What we can do so far:
- Least-squares approximation:
- Given more data points than basis functions,
we can fit an approximate function from a basis function set to the data
- Variants:
- We can solve linear systems in
a least-squares sense
- Given a function, we can fit the
most similar approximation from a subspace
- Extensions:
- Any known uncertainty in the data can be modeled by weights
- The multi-dimensional case is similar
26
Remaining problems
What is missing:
- Any error in x-direction is ignored so far (only y-direction)
- We will look at that problem next (total least-squares)...
- Noise must be Gaussian
- Can be generalized using iteratively reweighted least-squares
(M-estimators)
Total Least Squares
28
Statistical Model
Generative Model:
- riginal curve / surface
noisy sample points
29
Statistical Model
Generative Model:
- 1. Determine sample point (uniform)
- 2. Add noise (Gaussian)
sampling Gaussian noise many samples distribution (in space)
30
Squared Distance Function
Result:
- Gaussian distribution convolved with object
- No analytical density
Approximation:
- 1D Gaussian minimize squared residual
- This case minimize squared distance function
31
General Total Least Squares
General Total Least Squares:
- Given a class of objects obj with parameters k.
- A set of n sample points (Gaussian, iid, isotropic
covariance) di m.
- Total least squares solution minimizes:
- In general: Non-linear, possibly constrained (restrictions
- n admissible s) optimization problem
- Special cases can be solved exactly
n i i
d
- bj
dist
k
1 2
, min arg
λ λ
32
Fitting Affine Subspaces
The following problem can be solved exactly:
- Best fitting line to a set of 2D, 3D points
- Best fitting plane to a set of 3D points
- In general: Affine subspace of m, with dimension d m
that best approximates a set of data points di m.
Solution: principle component analysis (PCA)
33
Start: 0-dim Subspaces
Easy Start: The optimal 0-dimensional affine subspace
- Given a set D of n data points di m, what is the point x0
with minimum least square error to all data points?
- Answer: just the sample mean (average)...:
- Proof: minimize
(next slide...)
n i i
- pt
n
1 ) (
1 : ) m( d D x
n i i
E
1 2 0)
( d x x
34
n i n i i n i n i i n i i n i i n i n i i n i i n i n i i n i i n i n i i n i i n i i n i i
n E
1
- f
t independen 1 2 2 ) m( for minimum 1 1 2 1 1 2 1 1 2 1 2 1 1 2 1 2 1 1 2 1 2 1 2 1 2
) m( ) m( ) m( ), m( 2 ) m( ) m( ) m( ), m( 2 ) m( ) m( ) m( ), m( 2 ) m( ) m( ) m( ), m( 2 ) m( ) m( ) m( ) (
x D x
d D D x d D d d D x D x d D d D D x D x d D d D D x D x d D d D D x D x d D D x d x x
Proof
n i i
n
1
1 : ) m( d D
sample mean:
35
Proof
n i i
n
1
1 : ) m( d D
sample mean:
n i n i i n i n i i n i i n i i n i n i i n i i n i n i i n i i n i n i i n i i n i i n i i
n E
1
- f
t independen 1 2 2 ) m( for minimum 1 1 2 1 1 2 1 1 2 1 2 1 1 2 1 2 1 1 2 1 2 1 2 1 2
) m( ) m( ) m( ), m( 2 ) m( ) m( ) m( ), m( 2 ) m( ) m( ) m( ), m( 2 ) m( ) m( ) m( ), m( 2 ) m( ) m( ) m( ) (
x D x
d D D x d D d d D x D x d D d D D x D x d D d D D x D x d D d D D x D x d D D x d x x
36
One Dimensional Subspaces...
Next:
- What is the optimal line (1D subspace) that approximates
a set of data points di?
- Two questions:
- Optimum origin (point on the line)?
– This is still the average (proof omitted)
- Optimum direction?
– We will look at that next...
- Parametric line equation:
) 1 , , ( ) ( r r x r x x
m m
t t
37
, x r
i i
d t
Best Fitting Line
Line equation:
n i i i n i i
t line dist
1 2 1 2
] [ ) , ( d r x d ) 1 , , ( ) ( r r x r x x
m m
t t
Best projection on any line:
r (unit length) x0 (unit length) line r x0 di , x r
i i
d t
Objective Function:
38
Best Fitting Line
Objective Function:
r S
x d r x x r x d x r x d x r x r x d x r r x d r d r x d
w.r.t. const. 1 2 : Matrix, 1 T T 1 2 1 2 T 1 2 1 2 1 2 1 2 1 1 2 2 1 2 1 2 1 2
] [ ] [ ] [ , 2 , ] [ , 2 ] [ ] [ ) , (
n i i n i i i n i i n i i n i i n i i n i i n i i n i i i n i i n i i i n i i i n i i
d d d d d d t t t t line dist
, x r
i i
d t
- ptimal
parameters ti:
39
Best Fitting Line
Objective Function:
r S
x d r x x r x d x r x d x r x r x d x r r x d r d r x d
w.r.t. const. 1 2 : Matrix, 1 T T 1 2 1 2 T 1 2 1 2 1 2 1 2 1 1 2 2 1 2 1 2 1 2
] [ ] [ ] [ , 2 , ] [ , 2 ] [ ] [ ) , (
n i i n i i i n i i n i i n i i n i i n i i n i i n i i i n i i n i i i n i i i n i i
d d d d d d t t t t line dist
, x r
i i
d t
- ptimal
parameters ti:
40
Best Fitting Line
Result: Eigenvalue Problem:
- rTSr is a Rayleigh quotient
- Minimizing the energy: maximum quotient
- Solution: eigenvector with largest eigenvalue
1 , : with . ) , (
1 T T 1 2
r x x S Sr r d
n i i i n i i
d d const line dist
41
General Case
Fitting a d-dimensional affine subspace:
- d = 1: line
- d = 2: plane
- d = 3: 3D subspace
- ...
Simple rule:
- Use the d eigenvectors with the largest eigenvalues from
the spectrum of S.
- Gives the (total) least-squares optimal subspace that
approximates the data set D.
42
General Case
Procedure: Principal Component Analysis (PCA)
- Compute average x0 = m(D)
- Compute “scatter matrix”
- Let (1,v1), ... ,(n,vn) be sorted eigenvalue/vector pairs
- f S, where 1 is the largest, and the vi are of unit length.
- The subspace spanned by
approximates the data optimally in terms of squared distances to a point in the subspace.
- Stronger: projecting the data into this subspace is the best
d-dimensional (affine subspace) data approximation.
n i i i
d d
1 T
x x S
d i i i d
t x t t p
1 1,...,
v
43
Statistical Interpretation
Observation:
- S/(n-1) is the covariance matrix of the data set D = {di}i
- PCA can be interpreted as fitting a Gaussian distribution
and computing the main axes
x0
1 1 v
1 2 v
n i i i n i i
n n d n
1 T 1
1 1 1 1 ~ 1 x d x d S Σ x μ
μ x Σ μ x Σ x
μ Σ 1 2 1 2 ,
2 1 exp ) det( π) 2 ( 1 ) (
d
N ) (
,
x
μ Σ
N
44
Applications
Fitting a line to a point cloud in 2:
- Sample mean and direction
- f maximum eigenvalue
Plane Fitting in 3:
- Sample mean and the two
directions of maximum eigenvalues
- Smallest eigenvalue
- Eigenvector points in normal direction
- Aspect ratio (3 / 2) is a measure of “flatness”
(quality of fit) x0
1
) ( v x x t t
(2 / 1) small (2 / 1) larger
45
Applications
Application: Normal estimation in point clouds
- Given a set of points pi 3 that form a smooth surface.
- We want to estimate:
- Surface normals
- Sampling spacing
Algorithm:
- For each point, compute the k nearest neighbors (k 20)
- Compute a PCA (average, main axes) of these points
- Eigenvector with smallest eigenvalue normal direction
- The other two eigenvectors tangent vectors
- Tangent eigenvalues give sample spacing estimate
46
Example
points normals tangential frames elliptic splats w/shading
47
Applications
Another Application: Coordinate frame estimation
- More general total least-squares (non-linear) is difficult
- However: We can tweak it
- Compute coordinate frames using PCA
- Smallest eigenvalue = normal direction
- Form height field in normal direction
- Then: use ordinary least-squares in this coordinate system
48
Example
Example:
- k-nearest neighbors
- PCA coordinate frames
at each point
- Quadratic monomials
(bivariate, local coords.)
- Least squares fit
49
Curvature Estimation
Estimating curvature in point clouds:
- Quadratic fitting (as in the example)
- PCA coordinate frame (u, v, n)
- Height field in normal direction (u, v) n
- Fit quadratic polynomials
{1, u, v, uv, u2, v2}
- Eigenanalysis of the quadratic terms
- Hessian matrix from fitted polynomial
2uvuv + uuu2 + vvv2
- Compute principal directions
- Mean / Gauss / Normal curvature
vv uv uv uu
50
Bunny Curvature
Stanford Bunny
(dense point cloud) principal curvature 1 principal curvature 2 mean curvature Gaussian curvature [courtesy of Martin Bokeloh]
PCA and MDS
52
Notation
Data matrix
n
x x ~ ~ ~
1
X
Centered data matrix
X
11 I X X X X ~ ~ ~ ~ ~
T 1 1 1 1 1 n n i i n n i i n
x x
d-dimensional input vectors xi,1 ... xi,d
53
MDS
Multi-Dimensional Scaling
- Input: n n pairwise distance matrix D
- Compute pairwise scalar product matrix of centered
vectors:
- By this construction:
T 1 T 1 2 2 1
~ , ~ 11 I D 11 I K D
n n ij ij
d
T T
~ ~ X X X X XX K
(PCA: XTX)
54
MDS (2)
Multi-Dimensional Scaling
- Next: Compute eigenstructure of
- Thus, a candidate reconstruction of X
is given by
- Reminder: X – data vectors in rows
- Hence:
T
XX K
T
U UΛ K
1/2
UΛ X
) ( ,..., , ... , ) (
) ( ) ( 1 1 ) ( ) ( 1 1
n k x emb
i k k i i n n i i
u u u u
55
MDS (3)
Properties: (assuming Euclidian distances)
- Recovers points up to global translation and orthogonal
mapping
- Reduced version (k-dim.) preserves distances in a least
square sense (dimension reduction)
- Result is the same as for PCA embedding
- n point coordinates (next slide)
56
XV X ) (
PCA
emb
SVD of Centered Data Matrix
T T
) , min( 1
V U V UΛ X
n d
MDS is PCA
Equivalence of MDS and PCA
T T T T
U UΛ XX V VΛ X X
2 2
UΛ X ) (
MDS
emb UΛ XV V UΛ X
T
PCA: MDS: PCA: MDS:
57
So Where is the Difference?
PCA MDS
input
points
- pw. distances / scalar prods.
complexity
d d eigenvalue problem
(low dim., large data sets)
n n eigenvalue problem
(high dim., small data sets)
result
data embedding, principal variances, principal axes data embedding, principal variances, no principal axes
subspace projection
can easily embed additional vectors not obvious (yes: Nyström method)
58
Nyström Projection
Embedding further Vectors
- Recompute everything
- Expensive
- Inconsistent for some applications (new coordinates)
- “Nyström Formula”
- Compute embedding by linear combination of
computed eigenvectors
- Uses projections on input data set (scalar products only)
- Assumes knowledge of point positions
(later: measure distances only)
59
n i i n i n n i i i
u u emb
1 ) ( 1 ) 1 ( 1 T 1 T 1
, 1 , 1 ) ( x x x x x X U x X U Vx x Λ Λ
Nyström Projection
Nyström Projection
T
V UΛ X
T T
U UΛ XX K
2
UΛ X ) (
MDS
emb
- Reminder:
- Project vector x on principal axes:
Vx x ) (
PCA
emb
Kernel PCA
61
Kernel PCA
“Kernel PCA is classical scaling in feature space” [Williams 2002] Main Idea:
- MDS can be easily “kernelized” – just replace scalar
product matrix K with kernel matrix
- No need to deal with feature space explicitly (which might
be intractable)
- Will yield PCA anyway (but no eigenvectors)
62
Algorithm
Kernel PCA – Step 1/3
Compute kernel (Gram) matrix
) , ( ) , ( ) , ( ) , ( ) ( ), ( ) ( ), ( ) ( ), ( ) ( ), (
1 1 1 1 1 1 1 1 n n n n n n n n
x x k x x k x x k x x k x x x x x x x x K
63
Algorithm
Kernel PCA – Step 2/3: Center Data
- Cannot compute because
X is now in feature space
- Center kernel matrix directly:
X X X ~
T 1 T 1 T 1 T 1 ) (
11 K 11 11 K K 11 K K
n n n n centered
n k l n l k i n k k j n k k j i n k k j n k k i j i 1 1 T 1 T 1 T T 1 1 ,
) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( , ) ( ) ( x x x x x x x x x x x x K
Proof:
T 1 T 1
11 I K 11 I
n n
exactly the same as in MDS
64
Kernel PCA – Step 3/3 - Compute embedding
- Eigenspace:
- Embedding:
T ) (
U U K Λ
centered
Algorithm
) ( ,..., ) (
) ( ) ( 1 1
n k x emb
i k k i i
u u
65
Embedding Further Points
Project Additional Points:
- Use Nyström Formula:
- Uses only kernel computations in feature space
n i i n i n n i i i
k u k u emb
1 ) ( 1 ) 1 ( 1
, 1 , 1 ) ( x x x x x
66
Kernel PCA
Summary:
- Kernel PCA performs PCA/MDS in feature space using dot
products (i.e. kernel evaluations) only
- It gives the same result as MDS
67
Kernel PCA
Remarks:
- Unlike “real” PCA, it does not output principal axes
vectors
- They are in feature space, i.e. usually inaccessible
- Preimages in (low-dim.) input space do not need to exist
(there are approximation techniques)
- Even if so, they are difficult to compute...
...and the space is non-linear anyway (so they do not really help)
68
Kernel PCA
Complexity:
- Need to solve n n eigenvalue problem
- Memory, Time: (n2)
- Does not scale for large data sets
- Can use approximation techniques
- Idea: Landmark MDS
- Compute embedding on small subset of landmark points (e.g.
random subset)
- Use Nyström formula to embed other points
Examples
70
From the Paper
[Schölkopf et al. 1996]
71
Where is the Swiss Roll?
the roll exp. kernel = 0.30D exp. kernel = 0.35D exp. kernel = 0.35D (centered) poly.- kernel (5th order) = 0.35D
72
What Else Can You Do?
Image Denoising via PCA:
73
What Else Can You Do?
Does not work without correspondences:
74
What Else Can You Do?
Shift invariant comparison kernel:
75
MDS (Kernel PCA)
Shift invariant comparison kernel:
76
References
- B. Schölkopf, A. J. Smola, K.-R. Müller: Nonlinear Component Analysis as a Kernel Eigenvalue
- Problem. In: Neural Computation, 10:1299-1319, 1998.
- B. Schölkopf, S. Mika, C. J. C. Burges, P. Knirsch, K.-R. Müller, G. Ratsch, A. J. Smola: Input
Space versus Feature Space in Kernel-Based Methods. In: IEEE Trans. on Neural Networks, 10:1000-1017, 1999.
- C. Williams: On a Connection between Kernel PCA and Metric Multidimensional Scaling. In: Machine
Learning, 46:11-19, 2002. K.-R. Müller, S. Mika, G. Ratsch, K. Tsuda, B. Schölkopf: An Introduction to Kernel-Based Learning Algorithms. In: IEEE Trans. on Neural Networks, 12:181-201, 2001.
- J. Shawe-Taylor, N. Cristianini: Kernel Methods for Pattern Analysis. Cambridge University Press,
2004.
- T. Cox, M. Cox: Multi-Dimensional Scaling. Chapman & Hall, 1994.
Iteratively Reweighted Least Squares
78
General Error Distributions
Problem with least-squares
- Quadratic error measure
- The farther away, the stronger
the attraction force
- Outliers are disastrous
Gaussian likelihood
- neg. log-likelihood
derivative
- utlier
79
Outliers
Problem:
- Outliers are rather common
- 3D scanners: Weird outlier points at random locations
(Shiny surfaces, transmission errors, etc...)
- Least squares does not work well
- 0,6
- 0,4
- 0,2
0,2 0,4 0,6 0,2 0,4 0,6 0,8 1 1,2 x - 0.5 noise Linear (noise)
- 0,6
- 0,4
- 0,2
0,2 0,4 0,6 0,2 0,4 0,6 0,8 1 1,2 x - 0.5 noise Linear (noise)
uniform noise 2 outliers (out of 41)
80
Robust Estimators
Problem:
- Least squares criterion too strict
- More robust: absolute distance (“l1-norm”)
- “M-estimator”: Truncated squared/absolute distance
Problem squared distance
(“l2-norm error”)
absolute distance
(“l1-norm error”)
truncated
81
Implementation
Implementation: Iteratively reweighted least-squares
- Reminder: weighted least-squares
- Iteratively reweighted least-squares:
- Compute least-squares fit
- Compute weights (dependent on solution)
- Recompute / iterate until convergence
- L1 norm (absolute distance) weights:
n i i i i f
y x f
1 2 weights ~
) ) ( ~ ( min arg
n i i i k i f i i k i
y x f y x f
k
1 2 ) ( weights ~ ) 1 (
) ) ( ~ ( min arg ) ( ~ 1
) (
(iteration k)
82
L1-Norm Fitting
L1-Norm Fitting:
- Convergence guaranteed (convex objective function)
- Mixture of l1 (far) and l2 (near) norm also works
General M-Estimators
- No convergence guarantees for non-convex potentials
- In particular: Truncated potentials may converge to a
local extremum only (depends on initialization)
n i i i k f n i i i k i i k f
y x f y x f y x f
k k
1 ) 1 ( ~ 1 2 ) ( ) 1 ( ~
) ( ~ min arg ) ) ( ~ ( ) ( ~ 1 min arg
) ( ) (
83
- 0,6
- 0,4
- 0,2
0,2 0,4 0,6 0,2 0,4 0,6 0,8 1 1,2
x - 0.5 noise Linear (noise)
Example (Schematic)
- 0,6
- 0,4
- 0,2
0,2 0,4 0,6 0,2 0,4 0,6 0,8 1 1,2 x - 0.5 noise Linear (noise)
first iteration:
least squares, then truncation
second iteration:
improved solution