6.891 Computer Vision and Applications Prof. Trevor. Darrell - - PowerPoint PPT Presentation

6 891
SMART_READER_LITE
LIVE PREVIEW

6.891 Computer Vision and Applications Prof. Trevor. Darrell - - PowerPoint PPT Presentation

6.891 Computer Vision and Applications Prof. Trevor. Darrell Lecture 15: Fitting and Segmentation Readings: F&P Ch 15.3-15.5,16 1 Last time: Segmentation and Clustering (Ch. 14) Supervised->Unsupervised Category Learning


slide-1
SLIDE 1

1

6.891

Computer Vision and Applications

  • Prof. Trevor. Darrell

Lecture 15: Fitting and Segmentation Readings: F&P Ch 15.3-15.5,16

slide-2
SLIDE 2

2

Last time: “Segmentation and Clustering (Ch. 14)”

  • Supervised->Unsupervised Category Learning

needs segmentation

  • K-Means
  • Mean Shift
  • Graph cuts
  • Hough transform
slide-3
SLIDE 3

3 From: Rob Fergus http://www.robots.ox.ac.uk/%7Efergus/

[Slide from Bradsky & Thrun, Stanford]

slide-4
SLIDE 4

4

Learned Model

From: Rob Fergus http://www.robots.ox.ac.uk/%7Efergus/

The shape model. The mean location is indicated by the cross, with the ellipse showing the uncertainty in location. The number by each part is the probability of that part being present.

slide-5
SLIDE 5

5

slide-6
SLIDE 6

6

Background Techniques Compared

From the Wallflower Paper

slide-7
SLIDE 7

7

Mean Shift Algorithm

Mean Shift Algorithm

  • 1. Choose a search window size.
  • 2. Choose the initial location of the search window.
  • 3. Compute the mean location (centroid of the data) in the search window.
  • 4. Center the search window at the mean location computed in Step 3.
  • 5. Repeat Steps 3 and 4 until convergence.

The mean shift algorithm seeks the “mode” or point of highest density of a data distribution:

slide-8
SLIDE 8

8

Graph-Theoretic Image Segmentation

Build a weighted graph G=(V,E) from image V:image pixels E: connections between pairs of nearby pixels region same the to belong j & i y that probabilit :

ij

W

slide-9
SLIDE 9

9

Eigenvectors and affinity clusters

  • Simplest idea: we want a

vector a giving the association between each element and a cluster

  • We want elements within this

cluster to, on the whole, have strong affinity with one another

  • We could maximize
  • But need the constraint
  • Shi/Malik, Scott/Longuet-

Higgens, Ng/Jordan/Weiss, etc.

  • This is an eigenvalue problem -

choose the eigenvector of A with largest eigenvalue

aTAa aTa = 1

slide-10
SLIDE 10

10

Hough transform

tokens votes

slide-11
SLIDE 11

11

Today “Fitting and Segmentation (Ch. 15)”

  • Robust estimation
  • EM
  • Model Selection
  • RANSAC

(Maybe “Segmentation I” and “Segmentation II” would be a better way to split these two lectures!)

slide-12
SLIDE 12

12

Robustness

  • Squared error can be a source of bias in the

presence of noise points

– One fix is EM - we’ll do this shortly – Another is an M-estimator

  • Square nearby, threshold far away

– A third is RANSAC

  • Search for good points
slide-13
SLIDE 13

13

slide-14
SLIDE 14

14

slide-15
SLIDE 15

15

slide-16
SLIDE 16

16

slide-17
SLIDE 17

17

Robust Statistics

  • Recover the best fit to the majority of the data.
  • Detect and reject outliers.
slide-18
SLIDE 18

18

Estimating the mean

Gaussian distribution

2 4 6 3

=

=

N i i

d N

1

1 µ

Mean is the optimal solution to:

=

N i i

d

1 2

) ( min µ

µ

residual

slide-19
SLIDE 19

19

Estimating the Mean

The mean maximizes this likelihood:

=

− − =

N i i i

d d p

1 2 2

) / ) ( 2 1 exp( 2 1 ) | ( max σ µ σ π µ

µ

The negative log gives (with sigma=1):

=

N i i

d

1 2

) ( min µ

µ

“least squares” estimate

slide-20
SLIDE 20

20

Estimating the mean

2 4 6

slide-21
SLIDE 21

21

Estimating the mean

What happens if we change just one measurement? 6+∆

2 4

N ∆ + = µ µ'

With a single “bad” data point I can move the mean arbitrarily far.

slide-22
SLIDE 22

22

Influence

Breakdown point * percentage of outliers required to make the solution arbitrarily bad. Least squares: * influence of an outlier is linear (∆/N) * breakdown point is 0% -- not robust!

2 4 6+∆

What about the median?

slide-23
SLIDE 23

23

What’s Wrong?

=

N i i

d

1 2

) ( min µ

µ

Outliers (large residuals) have too much influence.

2

) ( x x = ρ x x 2 ) ( = ψ

slide-24
SLIDE 24

24

Approach

Influence is proportional to the derivative of the ρ function. Want to give less influence to points beyond some value.

slide-25
SLIDE 25

25

Approach

=

N i i

d

1

) , ( min σ µ ρ

µ

Scale parameter Robust error function Replace

2

) , (       = σ σ ρ x x

with something that gives less influence to outliers.

slide-26
SLIDE 26

26

Approach

=

N i i

d

1

) , ( min σ µ ρ

µ

Scale parameter Robust error function No closed form solutions!

  • Iteratively Reweighted Least Squares
  • Gradient Descent
slide-27
SLIDE 27

27

L1 Norm

| | ) ( x x = ρ ) ( sign ) ( x x = ψ

slide-28
SLIDE 28

28

Redescending Function

Beyond a point, the influence begins to decrease. Beyond where the second derivative is zero – outlier points

Tukey’s biweight.

slide-29
SLIDE 29

29

Robust Estimation Robust Estimation

Geman-McClure function works well. Twice differentiable, redescending.

Influence function (d/dr of norm):

2 2 2 2

) ( 2 ) , ( r r r + = σ σ σ ψ

2 2 2

) , ( r r r + = σ σ ρ

slide-30
SLIDE 30

30

slide-31
SLIDE 31

31

Robust scale

Scale is critical! Popular choice:

slide-32
SLIDE 32

32

Too small

slide-33
SLIDE 33

33

Too large

slide-34
SLIDE 34

34

Just right

slide-35
SLIDE 35

35

Example: Motion

Assumption: Within a finite image region, there is only a single motion present. Violated by: motion discontinuities, shadows, transparency, specular reflections… Violations of brightness constancy result in large residuals:

slide-36
SLIDE 36

36

Estimating Flow Estimating Flow

Minimize:

) , I ) ; ( I ) ; ( I ( ) ( σ ρ

t u x R

v u E + + = ∑

a x a x a

x

Parameterized models provide strong constraints: * Hundred, or thousands, of constraints. * Handful (e.g. six) unknowns.

Can be very accurate (when the model is good)!

slide-37
SLIDE 37

37

Deterministic Annealing

Start with a “quadratic” optimization problem and gradually reduce outliers.

slide-38
SLIDE 38

38

Continuation method

GNC: Graduated Non- Convexity

slide-39
SLIDE 39

39

Fragmented Occlusion

slide-40
SLIDE 40

40

Results

slide-41
SLIDE 41

41

Results

slide-42
SLIDE 42

42

Multiple Motions, again

X

Find the dominant motion while rejecting outliers.

Black & Anandan; Black & Jepson

slide-43
SLIDE 43

43

Robust estimation models only a single process explicitly

) ; ) ; ( ( ) (

,

σ ρ

+ ∇ =

R y x t T

I I E a x u a

Robust norm: Assumption: Constraints that don’t fit the dominant motion are treated as “outliers” (noise). Problem? They aren’t noise!

slide-44
SLIDE 44

44

Alternative View

* There are two things going on simultaneously. * We don’t know which constraint lines correspond to which motion. * If we knew this we could estimate the multiple motions.

  • a type of “segmentation” problem

* If we knew the segmentation then estimating the motion would be easy.

slide-45
SLIDE 45

45

EM General framework

Estimate parameters from segmented data. Consider segmentation labels to be missing data.

slide-46
SLIDE 46

46

Missing variable problems

A missing data problem is a statistical problem where some data is missing There are two natural contexts in which missing data are important:

  • terms in a data vector are missing for some

instances and present for other (perhaps someone responding to a survey was embarrassed by a question)

  • an inference problem can be made very much

simpler by rewriting it using some variables whose values are unknown.

slide-47
SLIDE 47

47

Missing variable problems

A missing data problem is a statistical problem where some data is missing There are two natural contexts in which missing data are important:

  • terms in a data vector are missing for some

instances and present for other (perhaps someone responding to a survey was embarrassed by a question)

  • an inference problem can be made very much

simpler by rewriting it using some variables whose values are unknown.

slide-48
SLIDE 48

48

Missing variable problems

In many vision problems, if some variables were known the maximum likelihood inference problem would be easy

– fitting; if we knew which line each token came from, it would be easy to determine line parameters – segmentation; if we knew the segment each pixel came from, it would be easy to determine the segment parameters – fundamental matrix estimation; if we knew which feature corresponded to which, it would be easy to determine the fundamental matrix – etc.

slide-49
SLIDE 49

49

Strategy

For each of our examples, if we knew the missing data we could estimate the parameters effectively. If we knew the parameters, the missing data would follow. This suggests an iterative algorithm:

  • 1. obtain some estimate of the missing data, using a

guess at the parameters;

  • 2. now form a maximum likelihood estimate of the

free parameters using the estimate of the missing data.

slide-50
SLIDE 50

50

Motion Segmentation

“What goes with what?”

The constraints at these pixels all “go together.”

slide-51
SLIDE 51

51

Smoothness in layers

slide-52
SLIDE 52

52

Layered Representation

segmentation

[Adelson]

slide-53
SLIDE 53

53

EM in Pictures

I(x,y,t) I(x,y,t+1) Given images at times t and t+1 containing two motions.

slide-54
SLIDE 54

54

EM in Pictures

) , (

1

y x w ) , (

2

y x w

I(x,y,t) I(x,y,t+1) Assume we know the segmentation of pixels into “layers”

1 ) , ( 1 ) , ( = ≤ ≤

i i i

y x w y x w

slide-55
SLIDE 55

55

EM in Pictures

) , (

1

y x w ) , (

2

y x w

I(x,y,t)

) ; , (

1 1

a u y x

I(x,y,t+1)

) ; , (

2 2

a u y x

Then estimating the motion of each “layer” is easy.

slide-56
SLIDE 56

56

EM in Equations

) , (

1

y x w

I(x,y,t)

) ; , (

1 1

a u y x

I(x,y,t+1)

2 , 1 1 1

) ) ; ( )( ( ) (

+ ∇ =

R y x t T

I I w E a x u x a

slide-57
SLIDE 57

57

EM in Equations

I(x,y,t) I(x,y,t+1)

2 , 2 2 2

) ) ; ( )( ( ) (

+ ∇ =

R y x t T

I I w E a x u x a

) , (

2

y x w

) ; , (

2 2

a u y x

slide-58
SLIDE 58

58

EM in Pictures

  • Ok. So where do we get the weights?
slide-59
SLIDE 59

59

EM in Pictures

The weights represent the probability that the constraint “belongs” to a particular layer.

slide-60
SLIDE 60

60

EM in Pictures

Assume we know the motion of the layers but not the ownership probabilities of the pixels (weights).

slide-61
SLIDE 61

61

EM in Pictures

) / ) ) ( ( 2 1 exp( 2 1 ) | ) 1 ( ), ( (

2 2 σ

σ π

t T

I I t I t I p + ∇ − ≈ + a u a

Assume we know the motion of the layers but not the ownership probabilities of the pixels (weights). Also assume we have a likelihood at each pixel:

slide-62
SLIDE 62

62

EM in Pictures

) / ) ( 2 1 exp( 2 1 ) | ) 1 ( ), ), ( ( (

2 2 1

σ σ π

t

I t I t I W p − ≈ + a

match Don’t match

Given the flow, warp the first image towards the second. Look at the residual error (It) (since the flow is now zero).

slide-63
SLIDE 63

63

EM in Pictures

) / ) ( 2 1 exp( 2 1 ) | ) 1 ( ), ), ( ( (

2 2 2

σ σ π

t

I t I t I W p − ≈ + a

Don’t match match

Given the flow, warp the first image towards the second. Look at the residual error (It) (since the flow is now zero).

slide-64
SLIDE 64

64

EM in Pictures

Two “explanations” for each pixel. Two likelihoods:

)) ( | ) 1 , ( ( )) ( | ) 1 , ( (

2 1

a u x a u x + + t I p t I p

slide-65
SLIDE 65

65

EM in Pictures

Compute total likelihood and normalize:

+ + =

k k i i

t I p t I p w )) ( | ) 1 , ( ( )) ( | ) 1 , ( ( ) ( a u x a u x x

slide-66
SLIDE 66

66

Motion segmentation Example

  • Model image pair (or video sequence) as consisting of

regions of parametric motion

– affine motion is popular

  • iterate E/M…

– determine which pixels belong to which region – estimate parameters

  vx vy    = a b c d       x y       + tx ty       

slide-67
SLIDE 67

67

Three frames from the MPEG “flower garden” sequence

Figure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE

slide-68
SLIDE 68

68

Grey level shows region no. with highest probability Segments and motion fields associated with them

Figure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE

slide-69
SLIDE 69

69

If we use multiple frames to estimate the appearance

  • f a segment, we can fill in occlusions; so we can

re-render the sequence with some segments removed.

Figure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE

slide-70
SLIDE 70

70

Lines

  • We wish to determine

– line parameters – p(comes from line)

  • Simple case: we have one

line, and n points

  • Some come from the line,

some from “noise”

  • This is a mixture model:

P point | line and noise params

( )= P point | line ( )P comes from line ( )+

P point | noise

( )P comes from noise ( )

= P point | line

( )λ + P point | noise ( )(1− λ)

  • e.g.,

– allocate each point to a line with a weight, which is the probability

  • f the point given the line

– refit lines to the weighted set of points

slide-71
SLIDE 71

71

Line fitting review

  • In case of single line and normal i.i.d. errors,

maximum likelihood estimation reduces to least- squares:

  • The line parameters (a,b) are solutions to the

system:

        =                

∑ ∑ ∑ ∑ ∑ ∑

i i i i i i i i i i i i

y y x b a x x x 1

2

( )

∑ ∑

= − +

i i b a i i i b a

r y b ax

2 , 2 ,

min min

slide-72
SLIDE 72

72

The E Step

  • Compute residuals:
  • Compute soft assignments:

i i i i

y b x a i r y b x a i r − + = − + =

2 2 2 1 1 1

) ( ) (

2 2 2 2 2 1 2 2 2 2 2 2 2 2 1 2 2 1

/ ) ( / ) ( / ) ( 2 / ) ( / ) ( / ) ( 1

) ( ) (

σ σ σ σ σ σ i r i r i r i r i r i r

e e e i w e e e i w

− − − − − −

+ = + =

k (uniform noise model)

slide-73
SLIDE 73

73

The M Step

Weighted least squares system is solved for (a1,b1)

        =                

∑ ∑ ∑ ∑ ∑ ∑

i i i i i i i i i i i i

y i w y x i w b a i w x i w x i w x i w ) ( ) ( ) ( ) ( ) ( ) (

1 1 1 1 1 1 1 2 1

slide-74
SLIDE 74

74

slide-75
SLIDE 75

75

The expected values of the deltas at the maximum (notice the one value close to zero).

slide-76
SLIDE 76

76

Closeup of the fit

slide-77
SLIDE 77

77

Issues with EM

  • Local maxima

– can be a serious nuisance in some problems – no guarantee that we have reached the “right” maximum

  • Starting

– k means to cluster the points is often a good idea

slide-78
SLIDE 78

78

Local maximum

slide-79
SLIDE 79

79

which is an excellent fit to some points

slide-80
SLIDE 80

80

and the deltas for this maximum

slide-81
SLIDE 81

81

Choosing parameters

  • What about the noise parameter, and the sigma for

the line?

– several methods

  • from first principles knowledge of the problem (seldom really

possible)

  • play around with a few examples and choose (usually quite

effective, as precise choice doesn’t matter much)

– notice that if kn is large, this says that points very seldom come from noise, however far from the line they lie

  • usually biases the fit, by pushing outliers into the line
  • rule of thumb; its better to fit to the better fitting points, within

reason; if this is hard to do, then the model could be a problem

slide-82
SLIDE 82

82

Estimating the number of models

  • In weighted scenario, additional models will not

necessarily reduce the total error.

  • The optimal number of models is a function of the

σ parameter – how well we expect the model to fit the data.

  • Algorithm: start with many models. redundant

models will collapse.

slide-83
SLIDE 83

83

Fitting 2 lines to data points

ri (xi,yi) y=a1x+b1+σv y=a2x+b2+σv

  • Input:

– Data points that where generated by 2 lines with Gaussian noise.

  • Output:

– The parameters of the 2 lines. – The assignment of each point to its line.

v~N(0,1)

slide-84
SLIDE 84

84

The E Step

  • Compute residuals assuming known lines:
  • Compute soft assignments:

i i i i

y b x a i r y b x a i r − + = − + =

2 2 2 1 1 1

) ( ) (

2 2 2 2 2 1 2 2 2 2 2 2 2 2 1 2 2 1

/ ) ( / ) ( / ) ( 2 / ) ( / ) ( / ) ( 1

) ( ) (

σ σ σ σ σ σ i r i r i r i r i r i r

e e e i w e e e i w

− − − − − −

+ = + =

slide-85
SLIDE 85

85

The M Step

  • In the weighted case we find

( )

∑ ∑

+

i i b a

i r i w i r i w ) ( ) ( ) ( ) ( min

2 2 2 2 1 1 ,

Weighted least squares system is solved twice for (a1,b1) and (a2,b2).

        =                

∑ ∑ ∑ ∑ ∑ ∑

i i i i i i i i i i i i

y i w y x i w b a i w x i w x i w x i w ) ( ) ( ) ( ) ( ) ( ) (

1 1 1 1 1 1 1 2 1

        =                

∑ ∑ ∑ ∑ ∑ ∑

i i i i i i i i i i i i

y i w y x i w b a i w x i w x i w x i w ) ( ) ( ) ( ) ( ) ( ) (

2 2 2 2 2 2 2 2 2

slide-86
SLIDE 86

86

Illustrations

slide-87
SLIDE 87

Illustration

slide-88
SLIDE 88

Illustration

l=log(likelihood)

slide-89
SLIDE 89

89

Color segmentation Example

Parameters include mixing weights and means/covars: yielding with

slide-90
SLIDE 90

90

EM for Mixture models

If log-likelihood is linear in missing variables we can replace missing variables with expectations. E.g.,

  • 1. (E-step) estimate complete data (e.g, zj’s) using

previous parameters

  • 2. (M-step) maximize complete log-likelihood

using estimated complete data

mixture model complete data log-likelihood

slide-91
SLIDE 91

91

Color segmentation with EM

slide-92
SLIDE 92

92

Color segmentation with EM

Initialize

slide-93
SLIDE 93

93

Color segmentation

  • At each pixel in an image, we compute a d-

dimensional feature vector x, which encapsulates position, colour and texture information.

  • Pixel is generated by one of G segments, each

Gaussian, chosen with probability π:

slide-94
SLIDE 94

94

Color segmentation with EM

Initialize E

slide-95
SLIDE 95

95

Color segmentation with EM

Initialize E M

slide-96
SLIDE 96

96

E-step

Estimate support maps:

slide-97
SLIDE 97

97

M-step

Update mean’s, covar’s, and mixing coef.’s using support map:

slide-98
SLIDE 98

98

slide-99
SLIDE 99

99

Segmentation with EM

slide-100
SLIDE 100

100

Model Selection

  • We wish to choose a

model to fit to data

– e.g. is it a line or a circle? – e.g is this a perspective or

  • rthographic camera?

– e.g. is there an aeroplane there or is it noise?

  • Issue

– In general, models with more parameters will fit a dataset better, but are poorer at prediction – This means we can’t simply look at the negative log- likelihood (or fitting error)

slide-101
SLIDE 101

101

Top is not necessarily a better fit than bottom (actually, almost always worse)

slide-102
SLIDE 102

102

slide-103
SLIDE 103

103

We can discount the fitting error with some term in the number

  • f parameters in the model.
slide-104
SLIDE 104

104

Discounts

  • BIC (Bayes information

criterion)

– choose model with smallest value of – N is the number of data points

  • Minimum description

length

– same criterion as BIC, but derived in a completely different way

−2L D;θ*

( )+ plogN

  • AIC (an information

criterion)

– choose model with smallest value of – p is the number of parameters

−2L D;θ*

( )+ 2 p

slide-105
SLIDE 105

105

Cross-validation

  • Split data set into two

pieces, fit to one, and compute negative log- likelihood on the other

  • Average over multiple

different splits

  • Choose the model with the

smallest value of this average

  • The difference in averages

for two different models is an estimate of the difference in KL divergence of the models from the source of the data

slide-106
SLIDE 106

106

Extreme segmentation

What if more than half the points are noise?

slide-107
SLIDE 107

107

RANSAC

  • Iterate:

– Sample – Fit – Test

  • Keep best estimate; refit on inliers
slide-108
SLIDE 108

108

RANSAC

  • Choose a small subset

uniformly at random

  • Fit to that
  • Anything that is close to

result is signal; all others are noise

  • Refit
  • Do this many times and

choose the best

  • Issues

– How many times?

  • Often enough that we are

likely to have a good line

– How big a subset?

  • Smallest possible

– What does close mean?

  • Depends on the problem

– What is a good line?

  • One where the number of

nearby points is so big it is unlikely to be all outliers

slide-109
SLIDE 109

109

slide-110
SLIDE 110

110

RANSAC applications

  • Fundamental Matricies

– estimate F from 7 points – test agreement with all other points

  • Direct motion

– estimate affine (or rigid motion) from small match – see what other parts of image are consistent

slide-111
SLIDE 111

111

Fitting and Probabilistic Segmentation

  • Robust estimation
  • EM
  • Model Selection
  • RANSAC

[Slides from Micheal Black and F&P]