Perceptual and Sensory Augmented Computing Computer Vision WS 0/09
Image Segmentation
Marc Pollefeys ETH Zurich
Slide credits:
- V. Ferrari, K. Grauman, B. Leibe, S. Lazebnik,
- S. Seitz,Y Boykov, W. Freeman, P. Kohli
Image Segmentation Perceptual and Sensory Augmented Computing Marc - - PowerPoint PPT Presentation
Image Segmentation Perceptual and Sensory Augmented Computing Marc Pollefeys Computer Vision WS 0/09 ETH Zurich Slide credits: V. Ferrari, K. Grauman, B. Leibe, S. Lazebnik, S. Seitz,Y Boykov, W. Freeman, P. Kohli Topics of This Lecture
Perceptual and Sensory Augmented Computing Computer Vision WS 0/09
Marc Pollefeys ETH Zurich
Slide credits:
Perceptual and Sensory Augmented Computing
Ø Gestalt principles Ø Image segmentation
Ø k-Means Ø Feature spaces Ø Mixture of Gaussians, EM
Perceptual and Sensory Augmented Computing
Determining image regions Grouping video frames into shots Object-level grouping Figure-ground
Slide credit: Kristen Grauman
What things should be grouped? What cues indicate groups?
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
http://chicagoist.com/attachments/chicagoist_alicia/GEESE.jpg, http://wwwdelivery.superstock.com/WI/223/1532/PreviewComp/SuperStock_1532R-0831.jpg
Slide adapted from Kristen Grauman
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
http://seedmagazine.com/news/2006/10/beauty_is_in_the_processingtim.php
Slide credit: Kristen Grauman
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Image credit: Arthus-Bertrand (via F. Durand)
Slide credit: Kristen Grauman
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
http://www.capital.edu/Resources/Images/outside6_035.jpg
Slide credit: Kristen Grauman
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
from relationships
Ø “The whole is greater than the sum of its parts”
Illusory/subjective contours Occlusion Familiar configuration http://en.wikipedia.org/wiki/Gestalt_psychology
Slide credit: Svetlana Lazebnik
Image source: Steve Lehar
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Whole is greater than sum of its parts Ø Relationships among parts can yield new properties/features
set of elements to be grouped (by human visual system)
Untersuchungen zur Lehre von der Gestalt, Psychologische Forschung, Vol. 4, pp. 301-350, 1923 http://psy.ed.asu.edu/~classics/Wertheimer/Forms/forms.htm
“I stand at the window and see a house, trees, sky. Theoretically I might say there were 327 brightnesses and nuances of colour. Do I have "327"? No. I have sky, house, and trees.”
Max Wertheimer
(1880-1943)
Slide credit: B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
These factors make intuitive sense, but are very difficult to translate into algorithms.
Image source: Forsyth & Ponce
Slide credit: B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Slide credit: B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Continuity, explanation by occlusion
Slide credit: B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Image source: Forsyth & Ponce
Slide credit: B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Image source: Forsyth & Ponce
Slide credit: B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Slide credit: B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Slide adapted from B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Slide credit: Steve Seitz, Kristen Grauman
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Image Human segmentation
Slide credit: Svetlana Lazebnik
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
further processing
“superpixels”
Slide credit: Svetlana Lazebnik
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Gestalt principles Ø Image segmentation
Ø k-Means Ø Feature spaces Ø Mixture of Gaussians, EM
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
which of these it is.
Ø i.e. segment the image based on the intensity feature.
intensity input image
black pixels gray pixels white pixels
Slide credit: Kristen Grauman
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Pixel count Input image Input image Intensity Pixel count Intensity
Slide credit: Kristen Grauman
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
define our groups?
Input image Intensity Pixel count
Slide credit: Kristen Grauman
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
intensities, and label every pixel according to which of these centers it is nearest to.
between all points and their nearest cluster center ci:
Slide credit: Kristen Grauman
190 255
1 2 3
Intensity
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø If we knew the cluster centers, we could allocate points to
groups by assigning each to its closest center.
Ø If we knew the group memberships, we could get the centers by
computing the mean per group.
Slide credit: Kristen Grauman
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
iterate between the two steps we just saw.
– For each point p, find the closest ci. Put p into cluster i
– Set ci to be the mean of points in cluster i
Ø
Will always converge to some solution
Ø
Can be a “local minimum”
– Does not always find the global minimum of objective function:
Slide credit: Steve Seitz
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
K=2 K=3
img_as_col = double(im(:)); cluster_membs = kmeans(img_as_col, K); labelim = zeros(size(im)); for i=1:k inds = find(cluster_membs==i); meanval = mean(img_as_column(inds)); labelim(inds) = meanval; end Slide credit: Kristen Grauman
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
can group pixels in different ways.
intensity similarity
Slide credit: Kristen Grauman
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
can group pixels in different ways.
R=255 G=200 B=250 R=245 G=220 B=248 R=15 G=189 B=2 R=3 G=12 B=2
R G B
Slide credit: Kristen Grauman
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
can group pixels in different ways.
Filter bank
F24 F2 F1
…
Slide credit: Kristen Grauman
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
à possible discontinuities
are spatially smooth?
1 2 3
Original Labeled by cluster center’s intensity
Slide adapted from Kristen Grauman
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
can group pixels in different ways.
intensity+position similarity ⇒ Way to encode both similarity and proximity.
Slide adapted from Kristen Grauman
X Intensity Y
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
essentially vector quantization of the image attributes
Ø Clusters don’t have to be spatially coherent
Image Intensity-based clusters Color-based clusters
Slide adapted from Svetlana Lazebnik
Image source: Forsyth & Ponce
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
essentially vector quantization of the image attributes
Ø Clusters don’t have to be spatially coherent
spatial coherence
Slide adapted from Svetlana Lazebnik
Image source: Forsyth & Ponce
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Simple, fast to compute Ø Converges to local minimum
Ø Setting k? Ø Sensitive to initial centers Ø Sensitive to outliers Ø Detects spherical clusters only Ø Assuming means can be
computed
Slide credit: Kristen Grauman
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø What’s the probability that a point x is in cluster m? Ø What’s the shape of each cluster?
Ø Instead of treating the data as a bunch of points, assume that
they are all generated by sampling a continuous function.
Ø This function is called a generative model. Ø Defined by a vector of parameters θ
Slide credit: Steve Seitz
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø K Gaussian blobs with means µb covariance matrices Vb, dimension d
– Blob b defined by:
Ø Blob b is selected with probability Ø The likelihood of observing x is a weighted mixture of Gaussians
,
Slide adapted from Steve Seitz
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø
Find blob parameters θ that maximize the likelihood function
1.
E-step: given current guess of blobs, compute probabilistic ownership
2.
M-step: given ownership probabilities, update blobs to maximize likelihood function
3.
Repeat until convergence
Slide adapted from Steve Seitz
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Compute probability that point x is in blob b, given current
guess of θ
Ø Compute overall probability that blob b is selected Ø Mean of blob b Ø Covariance of blob b
(N data points)
Slide adapted from Steve Seitz
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Any clustering problem Ø Any model estimation problem Ø Missing data problems Ø Finding outliers Ø Segmentation problems
– Segmentation based on color – Segmentation based on motion – Foreground/background separation
Ø ...
Ø http://lcn.epfl.ch/tutorial/english/gaussian/html/index.html
Slide credit: Steve Seitz
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Image source: Serge Belongie
k=2 k=3 k=4 k=5
EM segmentation results Original image
Slide credit: B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Probabilistic interpretation Ø Soft assignments between data points and clusters Ø Generative model, can predict novel data points Ø Relatively compact storage ( )
Ø Initialization
– often a good idea to start from output of k-means
Ø Local minima Ø Need to know number of components K
– solutions: model selection (AIC, BIC), Dirichlet process mixture
Ø Need to choose generative model (math form of a cluster ?) Ø Numerical problems are often a nuisance
Slide adapted from B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Gestalt principles Ø Image segmentation
Ø k-Means Ø Feature spaces Ø Mixture of Gaussians, EM
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Mode = local maximum of a given distribution Ø Easy to see, hard to compute
Slide adapted from Steve Seitz
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
based segmentation
http://coewww.rutgers.edu/riul/FORMER/comanici/MSPAMI/msPamiResults.html
PAMI 2002.
Slide credit: Svetlana Lazebnik
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
1.
Initialize random seed center and window W
2.
Calculate center of gravity (the “mean”) of W:
3.
Shift the search window to the mean
4.
Repeat steps 2+3 until convergence
Slide adapted from Steve Seitz
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass Mean Shift vector
Slide by Y . Ukrainitz & B. Sarel
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass Mean Shift vector
Slide by Y . Ukrainitz & B. Sarel
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass Mean Shift vector
Slide by Y . Ukrainitz & B. Sarel
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass Mean Shift vector
Slide by Y . Ukrainitz & B. Sarel
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass Mean Shift vector
Slide by Y . Ukrainitz & B. Sarel
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass Mean Shift vector
Slide by Y . Ukrainitz & B. Sarel
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass
Slide by Y . Ukrainitz & B. Sarel
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Tessellate the space with windows Run the procedure in parallel
Slide by Y . Ukrainitz & B. Sarel
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
The blue data points were traversed by the windows towards the mode.
Slide by Y . Ukrainitz & B. Sarel
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
lead to the same mode
Slide by Y . Ukrainitz & B. Sarel
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
mode
Slide adapted from Svetlana Lazebnik
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
http://coewww.rutgers.edu/riul/FORMER/comanici/MSPAMI/msPamiResults.html
Slide credit: Svetlana Lazebnik
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Slide credit: Svetlana Lazebnik
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø General, application-independent tool Ø Model-free, does not assume any prior shape (spherical,
elliptical, etc.) on data clusters
Ø Just a single parameter (window size h)
– h has a physical meaning (unlike k-means) == scale of clustering
Ø Finds variable number of modes given the same h Ø Robust to outliers
Ø Output depends on window size h Ø Window size (bandwidth) selection is not trivial Ø Computationally rather expensive Ø Does not scale well with dimension of feature space
Slide adapted from Svetlana Lazebnik
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Gestalt principles Ø Image segmentation
Ø k-Means Ø Feature spaces Ø Mixture of Gaussians, EM
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Node (vertex) for every pixel Ø Edge between every pair of pixels (p,q) Ø Affinity weight wpq for each edge
– wpq measures similarity – Similarity is inversely proportional to difference (in color, texture, position, …)
q p wpq
w
Slide adapted from Steve Seitz
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Delete edges crossing between segments Ø Easiest to break edges with low similarity (low weight)
– Similar pixels should be in the same segments – Dissimilar pixels should be in different segments
w A B C
Slide adapted from Steve Seitz
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
2
2 1 2
( , ) exp
d
aff x y x y
σ
= − −
2
2 1 2
( , ) exp ( ) ( )
d
aff x y I x I y
σ
= − −
(some suitable color space distance)
2
2 1 2
( , ) exp ( ), ( )
d
aff x y dist c x c y
σ
= −
Source: Forsyth & Ponce
2
2 1 2
( , ) exp ( ) ( )
d
aff x y f x f y
σ
= − −
(vectors of filter outputs)
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Slide adapted from Svetlana Lazebnik
Small σ Medium σ Large σ
Image Source: Forsyth & Ponce
small σ large σ data points
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Sum of weights of cut edges:
Ø What is a “good” graph cut and how do we find one?
Slide adapted from Steve Seitz
A B
∈ ∈
=
B q A p q p
w B A cut
, ,
) , (
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Image Source: Forsyth & Ponce
Here, the cut is nicely defined by the block-diagonal structure of the affinity matrix. ⇒ How can this be generalized?
Slide credit: B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
a graph
Ø
Efficient algorithms exist for doing this
Ø
Weight of cut proportional to number of edges in the cut
Ø
Minimum cut tends to cut off very small, isolated components Ideal Cut Cuts with lesser weight than the ideal cut
Slide credit: Khurram Hassan-Shafique
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
be computed by solving a generalized eigenvalue problem.
assoc(A,V) = sum of weights from A to all nodes to graph
cut(A,B) assoc(A,V) + cut(A,B) assoc(B,V)
Slide adapted from Svetlana Lazebnik
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Elasticity proportional to cost Ø Vibration “modes” correspond to segments
– Can compute these by solving a generalized eigenvector problem
Slide adapted from Steve Seitz
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
,
: ( , ) ; : ( , ) ( , ); : {1, 1} , ( ) 1 . the affinity matrix, the diag. matrix, a vector in
i j j N
W W i j w D D i i W i j x x i i A = = − = ⇔ ∈
Slide credit: Jitendra Malik
(A,B) (A,B) (A,B) (A,V) (B,V) ( , ) (1 ) ( )(1 ) (1 ) ( )(1 ) ; 1 1 (1 )1 1 ( , ) ...
i
T T x T T i
cut cut NCut assoc assoc D i i x D W x x D W x k k D k D D i i
>
= + + − + − − − = + = − =
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Slide credit: Jitendra Malik
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Solution given by the “generalized” eigenvalue problem
Ø Solved by converting to standard eigenvalue problem
Ø Smallest eigenvector is with eigenvalue 0 (and ) Ø Optimal solution is second smallest eigenvector Ø Gives continuous result—must convert into discrete values of y
( ) ( , ) , with {1, }, 1 0.
T T i T
y D W y NCut A B y b y D y Dy − = ∈ − =
Dy y W D λ = − ) (
Slide adapted from Alyosha Efros
This is hard, as y is discrete! Relaxation: y is continuous.
with
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Smallest eigenvectors
Image source: Shi & Malik
NCuts segments
Slide credit: B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
weights,
Ø This is where the approximation is made (we’re not solving NP).
specified value.
( ) D W y Dy λ − = ( , )
W i j i j =
Slide credit: Jitendra Malik
NCuts Matlab code available at http://www.cis.upenn.edu/~jshi/software/
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Image Source: Shi & Malik
Slide credit: Steve Seitz
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Generic framework, flexible to choice of function that computes
weights (“affinities”) between nodes
Ø Does not require any model of the data distribution
Ø Time and memory complexity can be high
– Dense, highly connected graphs ⇒ many affinity computations – Solving eigenvalue problem
Ø Preference for balanced partitions
– If a region is uniform, NCuts will find the modes of vibration of the image dimensions
Slide credit: Kristen Grauman
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
into regions, yet finding meaningful segments is intertwined with the recognition problem.
Slide credit: Kristen Grauman
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Gestalt principles Ø Image segmentation
Ø k-Means Ø Feature spaces Ø Mixture of Gaussians, EM
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Learn local effects, get global effects out
Slide credit: William Freeman
Observed evidence Hidden “true states” Neighborhood relations
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Image Image pixels states (e.g. foreground/background)
Slide adapted from William Freeman
( , )
i i
x y Φ
( , )
i j
x x Ψ
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
,
i i i j i i j
states Image
Slide adapted from William Freeman
Image-state compatibility function state-state compatibility function Neighboring nodes Local
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
minimizing the log
mechanics (spin glass theory). We therefore draw the analogy and call E an energy function.
,
( , ) ( , ) ( , )
i i i j i i j
P x y x y x x = Φ Ψ
, ,
log ( , ) log ( , ) log ( , ) ( , ) ( , ) ( , )
i i i j i i j i i i j i i j
P x y x y x x E x y x y x x ϕ ψ = Φ + Ψ = +
Slide credit: B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Encode local information about the given pixel/patch Ø How likely is a pixel/patch to be in a certain state ?
(e.g. foreground/background)?
Ø Encode neighborhood information Ø How different is a pixel/patch’s label from that of its neighbor?
(e.g. here independent of image data, but later based on intensity/color/texture difference) Pairwise potentials Unary potentials
( , )
i i
x y ϕ ( , )
i j
x x ψ
,
( , ) ( , ) ( , )
i i i j i i j
E x y x y x x ϕ ψ = +
Slide adapted from B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Infer the optimal labeling of the MRF.
Ø Gibbs sampling, simulated annealing Ø Iterated conditional modes (ICM) Ø Variational methods Ø Belief propagation Ø Graph cuts
Ø Only suitable for a certain class of energy functions Ø But the solution can be obtained very fast for typical vision
problems (~1MPixel/sec).
( , )
i i
x y ϕ ( , )
i j
x x ψ
Slide credit: B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
n-links s t a cut
hard constraint hard constraint
Minimum cost cut can be computed in polynomial time
(max-flow/min-cut algorithms)
[Boykov & Jolly, ICCV’01] Slide adapted from Yuri Boykov
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
∈
≠ ⋅ + =
N pq q p pq p p p
L L w L D L E ) ( ) ( ) ( δ
t-links n-links
Boundary term Regional term (binary segmentation)
Slide credit: Yuri Boykov
⎭ ⎬ ⎫ ⎩ ⎨ ⎧ Δ − =
2
2 exp σ
pq pq
I w
pq
I Δ
σ
s t a cut
) (s Dp ) (t Dp
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
pq
w
n-links s t a cut
) (t Dp
t-link
) (s Dp
t-link
NOTE: hard constrains are not required, in general.
Regional bias example
Suppose are given “expected” intensities
t s
I I and
( )
2 2 2
/ || || exp ) ( σ
s p p
I I s D − − ∝
( )
2 2 2
/ || || exp ) ( σ
t p p
I I t D − − ∝
[Boykov & Jolly, ICCV’01] Slide credit: Yuri Boykov
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
pq
w
n-links s t a cut
) (t Dp
t-link
) (s Dp
t-link
( )
2 2 2
/ || || exp ) ( σ
s p p
I I s D − − ∝
( )
2 2 2
/ || || exp ) ( σ
t p p
I I t D − − ∝
EM-style optimization “expected” intensities of
can be re-estimated
t s
I I and
[Boykov & Jolly, ICCV’01] Slide credit: Yuri Boykov
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
intensity models of object and background
a cut
p p p p
given object and background intensity histograms
) (s Dp ) (t Dp
s t
I
) | Pr( s I p ) | Pr( t I p
p
I
[Boykov & Jolly, ICCV’01] Slide credit: Yuri Boykov
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø e.g. modeled with a Mixture of Gaussians
Ø e.g. a “contrast sensitive Potts model”
where
[Shotton & Winn, ECCV’06]
( , , ( ); ) ( ) ( )
T i j ij ij i j
x x g y g y x x
φ φ
φ θ θ δ = − ≠
2
2
i j
avg y y β = ⋅ −
2
( )
i j
y y ij
g y e
β − −
= ( , ; ) log ( , ) ( | ) ( ; , )
i i i i i k k k
x y x k P k x N y y
π π
π θ θ = Σ
Slide credit: B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Source Sink v1 v2
2 5 9 4 2 1 Graph (V, E, C)
Vertices V = {v1, v2 ... vn} Edges E = {(v1, v2) ....} Costs C = {c(1, 2) ....}
Slide credit: Pushmeet Kohli
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Source Sink v1 v2
2 5 9 4 2 1
Slide credit: Pushmeet Kohli
What is an st-cut? What is the cost of a st-cut?
An st-cut (S,T) divides the nodes between source and sink. Sum of cost of all edges going from S to T
5 + 2 + 9 = 16
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Source Sink v1 v2
2 5 9 4 2 1
Slide credit: Pushmeet Kohli
What is an st-cut? What is the cost of a st-cut?
An st-cut (S,T) divides the nodes between source and sink. Sum of cost of all edges going from S to T st-cut with the minimum cost
What is the st-mincut?
2 + 1 + 4 = 7
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Augmenting Path and Push-Relabel
n: #nodes m: #edges U: maximum
edge weight Algorithms assume non- negative edge weights
Slide credit: Andrew Goldberg
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Source Sink v1 v2
2 5 9 4 2 1 Solve the dual maximum flow problem
In every network, the maximum flow equals the cost of the st-mincut
Min-cut/Max-flow Theorem Compute the maximum flow between Source and Sink
Constraints Edges: Flow < Capacity Nodes: Flow in = Flow out
Slide credit: Pushmeet Kohli
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Source Sink v1 v2
2 5 9 4 2 1
Slide credit: Pushmeet Kohli
Augmenting Path Based Algorithms
with positive capacity
through this path
found Algorithms assume non-negative capacity Flow = 0
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Source Sink v1 v2
9 4 2 1
Slide credit: Pushmeet Kohli
Augmenting Path Based Algorithms
with positive capacity
through this path
found Algorithms assume non-negative capacity Flow = 0 2 5
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Source Sink v1 v2
9 4 2 1
Slide credit: Pushmeet Kohli
Augmenting Path Based Algorithms
with positive capacity
through this path
found Algorithms assume non-negative capacity Flow = 0 + 2 5-2 2-2
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Source Sink v1 v2
9 4 2 1
Slide credit: Pushmeet Kohli
Augmenting Path Based Algorithms
with positive capacity
through this path
found Algorithms assume non-negative capacity Flow = 2 3
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Source Sink v1 v2
3 9 4 2 1
Slide credit: Pushmeet Kohli
Augmenting Path Based Algorithms
with positive capacity
through this path
found Algorithms assume non-negative capacity Flow = 2
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Source Sink v1 v2
3 2 1
Slide credit: Pushmeet Kohli
Augmenting Path Based Algorithms
with positive capacity
through this path
found Algorithms assume non-negative capacity Flow = 2 9 4
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Source Sink v1 v2
3 2 1
Slide credit: Pushmeet Kohli
Augmenting Path Based Algorithms
with positive capacity
through this path
found Algorithms assume non-negative capacity Flow = 2 + 4 5
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Source Sink v1 v2
3 5 2 1
Slide credit: Pushmeet Kohli
Augmenting Path Based Algorithms
with positive capacity
through this path
found Algorithms assume non-negative capacity Flow = 6
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Source Sink v1 v2
2
Slide credit: Pushmeet Kohli
Augmenting Path Based Algorithms
with positive capacity
through this path
found Algorithms assume non-negative capacity Flow = 6 3 5 1
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Source Sink v1 v2
2
Slide credit: Pushmeet Kohli
Augmenting Path Based Algorithms
with positive capacity
through this path
found Algorithms assume non-negative capacity Flow = 6 + 1 2 4 1-1
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Source Sink v1 v2
3 5 2
Slide credit: Pushmeet Kohli
Augmenting Path Based Algorithms
with positive capacity
through this path
found Algorithms assume non-negative capacity Flow = 7
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Source Sink v1 v2
3 5 2
Slide credit: Pushmeet Kohli
Augmenting Path Based Algorithms
with positive capacity
through this path
found Algorithms assume non-negative capacity Flow = 7
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
problems
Ø Grid graphs Ø Low connectivity (m ~ O(n))
algorithm
[Boykov and Kolmogorov PAMI 2004]
Ø Finds approximate shortest augmenting
paths efficiently
Ø High worst-case time complexity Ø Empirically outperforms other
algorithms on vision problems
Ø Efficient code available on the web
http://www.adastral.ucl.ac.uk/~vladkolm/software.html
Slide credit: Pushmeet Kohli
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
that are submodular.
Ø Current research topic
∈
+ =
N pq q p p p p
L L E L E L E ) , ( ) ( ) (
t-links n-links
Boundary term Regional term E(L) can be minimized by s-t graph cuts
) , ( ) , ( ) , ( ) , ( s t E t s E t t E s s E + ≤ +
Submodularity (“convexity”)
[Boros & Hummer, 2002, Kolmogorov & Zabih, 2004] Slide credit: B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
energies is a nuisance.
⇒ Binary segmentation only
Ø NP-hard problem with 3 or more labels
extend graph cuts to the multi-label case
Ø α-Expansion Ø αβ
αβ-Swap
Ø But α-Expansion has a guaranteed approximation quality (2-
approx) and converges in a few iterations.
Slide credit: B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Break multi-way cut computation into a sequence of
binary s-t cuts.
α
Slide credit: Yuri Boykov
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
1.
Compute optimal α-expansion move (s-t graph cuts): set the label of each node to alpha or leave to current label (so s = alpha and t = current)
2.
Decline the move if there is no energy decrease
à each move is optimal within a very large set of possible segmentations (2N)
Slide adapted from Yuri Boykov
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
initial solution
For each move we choose the expansion that gives the largest decrease in the energy: binary optimization problem
Slide credit: Yuri Boykov
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
User segmentation cues Additional segmentation cues
Ø Rough region cues sufficient Ø Segmentation boundary can be extracted from edges
Ø User marks foreground and background regions with a brush à
à get initial segmentation à à correct by additional brush strokes
Slide adapted from Matthieu Bray
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø User marks foreground and background regions with a brush Ø Alternatively, user can specify a bounding box
Global optimum of the unary energy Background color Foreground color
Slide adapted from Carsten Rother
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
How to choose γ ?
Slide credit: Carsten Rother Error (%) over training set: 25
2
( , )
( , ) e
m n
y y n m m n C
x y x x
β
ψ γ δ
− − ∈
= ≠
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Energy after each iteration Result
Foreground & Background Background G
R
Foreground Background G
R
1 2 3 4
Color model (Mixture of Gaussians)
Slide credit: Carsten Rother
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Even with efficient graph cuts, an MRF
formulation has too many nodes for interactive results.
Ø Group together similar-looking
pixels for efficiency of further processing.
Ø Cheap, local oversegmentation Ø Important to ensure that superpixels
do not cross boundaries
Ø Superpixel code available here
Ø http://www.cs.sfu.ca/~mori/research/superpixels/
Image source: Greg Mori
Slide credit: B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Speedup Graph structure
Slide credit: B. Leibe
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09
Ø Powerful technique, based on probabilistic model (MRF). Ø Applicable for a wide range of problems. Ø Very efficient algorithms available for vision problems. Ø Becoming a de-facto standard for many segmentation tasks.
Ø Graph cuts can only solve a limited class of models
– Submodular energy functions – Can capture only part of the expressiveness of MRFs
Ø Only approximate algorithms available for multi-label case
Slide credit: B. Leibe