1
6.891
Computer Vision and Applications
- Prof. Trevor. Darrell
Lecture 15: Fitting and Segmentation Readings: F&P Ch 15.3-15.5,16
6.891 Computer Vision and Applications Prof. Trevor. Darrell - - PowerPoint PPT Presentation
6.891 Computer Vision and Applications Prof. Trevor. Darrell Lecture 15: Fitting and Segmentation Readings: F&P Ch 15.3-15.5,16 1 Last time: Segmentation and Clustering (Ch. 14) Supervised->Unsupervised Category Learning
1
Lecture 15: Fitting and Segmentation Readings: F&P Ch 15.3-15.5,16
2
3 From: Rob Fergus http://www.robots.ox.ac.uk/%7Efergus/
[Slide from Bradsky & Thrun, Stanford]
4
From: Rob Fergus http://www.robots.ox.ac.uk/%7Efergus/
The shape model. The mean location is indicated by the cross, with the ellipse showing the uncertainty in location. The number by each part is the probability of that part being present.
5
6
From the Wallflower Paper
7
Mean Shift Algorithm
The mean shift algorithm seeks the “mode” or point of highest density of a data distribution:
8
ij
9
vector a giving the association between each element and a cluster
cluster to, on the whole, have strong affinity with one another
Higgens, Ng/Jordan/Weiss, etc.
choose the eigenvector of A with largest eigenvalue
aTAa aTa = 1
10
tokens votes
11
12
– One fix is EM - we’ll do this shortly – Another is an M-estimator
– A third is RANSAC
13
14
15
16
17
18
Gaussian distribution
2 4 6 3
=
=
N i i
d N
1
1 µ
Mean is the optimal solution to:
=
N i i
1 2
µ
residual
19
The mean maximizes this likelihood:
=
− − =
N i i i
d d p
1 2 2
) / ) ( 2 1 exp( 2 1 ) | ( max σ µ σ π µ
µ
The negative log gives (with sigma=1):
=
N i i
1 2
µ
“least squares” estimate
20
2 4 6
21
What happens if we change just one measurement? 6+∆
2 4
With a single “bad” data point I can move the mean arbitrarily far.
22
Breakdown point * percentage of outliers required to make the solution arbitrarily bad. Least squares: * influence of an outlier is linear (∆/N) * breakdown point is 0% -- not robust!
2 4 6+∆
What about the median?
23
=
N i i
1 2
µ
Outliers (large residuals) have too much influence.
2
) ( x x = ρ x x 2 ) ( = ψ
24
Influence is proportional to the derivative of the ρ function. Want to give less influence to points beyond some value.
25
=
N i i
1
µ
Scale parameter Robust error function Replace
2
with something that gives less influence to outliers.
26
=
N i i
1
µ
Scale parameter Robust error function No closed form solutions!
27
| | ) ( x x = ρ ) ( sign ) ( x x = ψ
28
Beyond a point, the influence begins to decrease. Beyond where the second derivative is zero – outlier points
Tukey’s biweight.
29
Influence function (d/dr of norm):
2 2 2 2
2 2 2
30
31
32
Too small
33
Too large
34
Just right
35
Assumption: Within a finite image region, there is only a single motion present. Violated by: motion discontinuities, shadows, transparency, specular reflections… Violations of brightness constancy result in large residuals:
36
t u x R
∈
x
Can be very accurate (when the model is good)!
37
38
GNC: Graduated Non- Convexity
39
40
41
42
Find the dominant motion while rejecting outliers.
Black & Anandan; Black & Jepson
43
,
∈
R y x t T
Robust norm: Assumption: Constraints that don’t fit the dominant motion are treated as “outliers” (noise). Problem? They aren’t noise!
44
* There are two things going on simultaneously. * We don’t know which constraint lines correspond to which motion. * If we knew this we could estimate the multiple motions.
* If we knew the segmentation then estimating the motion would be easy.
45
46
47
48
– fitting; if we knew which line each token came from, it would be easy to determine line parameters – segmentation; if we knew the segment each pixel came from, it would be easy to determine the segment parameters – fundamental matrix estimation; if we knew which feature corresponded to which, it would be easy to determine the fundamental matrix – etc.
49
guess at the parameters;
free parameters using the estimate of the missing data.
50
The constraints at these pixels all “go together.”
51
52
segmentation
[Adelson]
53
I(x,y,t) I(x,y,t+1) Given images at times t and t+1 containing two motions.
54
1
2
I(x,y,t) I(x,y,t+1) Assume we know the segmentation of pixels into “layers”
i i i
55
1
2
I(x,y,t)
) ; , (
1 1
a u y x
I(x,y,t+1)
) ; , (
2 2
a u y x
Then estimating the motion of each “layer” is easy.
56
1
I(x,y,t)
) ; , (
1 1
a u y x
I(x,y,t+1)
2 , 1 1 1
∈
R y x t T
57
I(x,y,t) I(x,y,t+1)
2 , 2 2 2
∈
R y x t T
2
) ; , (
2 2
a u y x
58
59
The weights represent the probability that the constraint “belongs” to a particular layer.
60
Assume we know the motion of the layers but not the ownership probabilities of the pixels (weights).
61
) / ) ) ( ( 2 1 exp( 2 1 ) | ) 1 ( ), ( (
2 2 σ
σ π
t T
I I t I t I p + ∇ − ≈ + a u a
Assume we know the motion of the layers but not the ownership probabilities of the pixels (weights). Also assume we have a likelihood at each pixel:
62
) / ) ( 2 1 exp( 2 1 ) | ) 1 ( ), ), ( ( (
2 2 1
σ σ π
t
I t I t I W p − ≈ + a
match Don’t match
Given the flow, warp the first image towards the second. Look at the residual error (It) (since the flow is now zero).
63
) / ) ( 2 1 exp( 2 1 ) | ) 1 ( ), ), ( ( (
2 2 2
σ σ π
t
I t I t I W p − ≈ + a
Don’t match match
Given the flow, warp the first image towards the second. Look at the residual error (It) (since the flow is now zero).
64
Two “explanations” for each pixel. Two likelihoods:
2 1
65
Compute total likelihood and normalize:
k k i i
66
regions of parametric motion
– affine motion is popular
– determine which pixels belong to which region – estimate parameters
vx vy = a b c d x y + tx ty
67
Three frames from the MPEG “flower garden” sequence
Figure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE
68
Grey level shows region no. with highest probability Segments and motion fields associated with them
Figure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE
69
If we use multiple frames to estimate the appearance
re-render the sequence with some segments removed.
Figure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE
70
– line parameters – p(comes from line)
line, and n points
some from “noise”
P point | line and noise params
P point | noise
= P point | line
– allocate each point to a line with a weight, which is the probability
– refit lines to the weighted set of points
71
=
i i i i i i i i i i i i
y y x b a x x x 1
2
= − +
i i b a i i i b a
r y b ax
2 , 2 ,
min min
72
i i i i
y b x a i r y b x a i r − + = − + =
2 2 2 1 1 1
) ( ) (
2 2 2 2 2 1 2 2 2 2 2 2 2 2 1 2 2 1
/ ) ( / ) ( / ) ( 2 / ) ( / ) ( / ) ( 1
) ( ) (
σ σ σ σ σ σ i r i r i r i r i r i r
e e e i w e e e i w
− − − − − −
+ = + =
k (uniform noise model)
73
=
i i i i i i i i i i i i
y i w y x i w b a i w x i w x i w x i w ) ( ) ( ) ( ) ( ) ( ) (
1 1 1 1 1 1 1 2 1
74
75
The expected values of the deltas at the maximum (notice the one value close to zero).
76
Closeup of the fit
77
– can be a serious nuisance in some problems – no guarantee that we have reached the “right” maximum
– k means to cluster the points is often a good idea
78
Local maximum
79
which is an excellent fit to some points
80
and the deltas for this maximum
81
– several methods
possible)
effective, as precise choice doesn’t matter much)
– notice that if kn is large, this says that points very seldom come from noise, however far from the line they lie
reason; if this is hard to do, then the model could be a problem
82
83
ri (xi,yi) y=a1x+b1+σv y=a2x+b2+σv
– Data points that where generated by 2 lines with Gaussian noise.
– The parameters of the 2 lines. – The assignment of each point to its line.
v~N(0,1)
84
i i i i
y b x a i r y b x a i r − + = − + =
2 2 2 1 1 1
) ( ) (
2 2 2 2 2 1 2 2 2 2 2 2 2 2 1 2 2 1
/ ) ( / ) ( / ) ( 2 / ) ( / ) ( / ) ( 1
) ( ) (
σ σ σ σ σ σ i r i r i r i r i r i r
e e e i w e e e i w
− − − − − −
+ = + =
85
+
i i b a
i r i w i r i w ) ( ) ( ) ( ) ( min
2 2 2 2 1 1 ,
=
i i i i i i i i i i i i
y i w y x i w b a i w x i w x i w x i w ) ( ) ( ) ( ) ( ) ( ) (
1 1 1 1 1 1 1 2 1
=
i i i i i i i i i i i i
y i w y x i w b a i w x i w x i w x i w ) ( ) ( ) ( ) ( ) ( ) (
2 2 2 2 2 2 2 2 2
86
l=log(likelihood)
89
90
mixture model complete data log-likelihood
91
92
Initialize
93
94
Initialize E
95
Initialize E M
96
97
98
99
Segmentation with EM
100
model to fit to data
– e.g. is it a line or a circle? – e.g is this a perspective or
– e.g. is there an aeroplane there or is it noise?
– In general, models with more parameters will fit a dataset better, but are poorer at prediction – This means we can’t simply look at the negative log- likelihood (or fitting error)
101
Top is not necessarily a better fit than bottom (actually, almost always worse)
102
103
We can discount the fitting error with some term in the number
104
criterion)
– choose model with smallest value of – N is the number of data points
length
– same criterion as BIC, but derived in a completely different way
−2L D;θ*
criterion)
– choose model with smallest value of – p is the number of parameters
−2L D;θ*
105
pieces, fit to one, and compute negative log- likelihood on the other
different splits
smallest value of this average
for two different models is an estimate of the difference in KL divergence of the models from the source of the data
106
107
108
uniformly at random
result is signal; all others are noise
choose the best
– How many times?
likely to have a good line
– How big a subset?
– What does close mean?
– What is a good line?
nearby points is so big it is unlikely to be all outliers
109
110
– estimate F from 7 points – test agreement with all other points
– estimate affine (or rigid motion) from small match – see what other parts of image are consistent
111