[PPT] - Image Segmentation Perceptual and Sensory Augmented Computing Marc PowerPoint Presentation

SLIDE 1

Perceptual and Sensory Augmented Computing Computer Vision WS 0/09

Image Segmentation

Marc Pollefeys ETH Zurich

Slide credits:

V. Ferrari, K. Grauman, B. Leibe, S. Lazebnik,
S. Seitz,Y Boykov, W. Freeman, P. Kohli

SLIDE 2

Perceptual and Sensory Augmented Computing

Topics of This Lecture

Introduction

Ø Gestalt principles Ø Image segmentation

Segmentation as clustering

Ø k-Means Ø Feature spaces Ø Mixture of Gaussians, EM

Model-free clustering: Mean-Shift
Graph theoretic segmentation: Normalized Cuts
Interactive Segmentation with GraphCuts

SLIDE 3

Perceptual and Sensory Augmented Computing

Examples of Grouping in Vision

Determining image regions Grouping video frames into shots Object-level grouping Figure-ground

Slide credit: Kristen Grauman

What things should be grouped? What cues indicate groups?

SLIDE 4

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Similarity in appearance

http://chicagoist.com/attachments/chicagoist_alicia/GEESE.jpg, http://wwwdelivery.superstock.com/WI/223/1532/PreviewComp/SuperStock_1532R-0831.jpg

Slide adapted from Kristen Grauman

SLIDE 5

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Symmetry

http://seedmagazine.com/news/2006/10/beauty_is_in_the_processingtim.php

Slide credit: Kristen Grauman

SLIDE 6

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Common Fate

Image credit: Arthus-Bertrand (via F. Durand)

Slide credit: Kristen Grauman

SLIDE 7

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Proximity

http://www.capital.edu/Resources/Images/outside6_035.jpg

Slide credit: Kristen Grauman

SLIDE 8

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

The Gestalt School

Grouping is key to visual perception
Elements in a collection can have properties that result

from relationships

Ø “The whole is greater than the sum of its parts”

Illusory/subjective contours Occlusion Familiar configuration http://en.wikipedia.org/wiki/Gestalt_psychology

Slide credit: Svetlana Lazebnik

Image source: Steve Lehar

SLIDE 9

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Gestalt Theory

Gestalt: whole or group

Ø Whole is greater than sum of its parts Ø Relationships among parts can yield new properties/features

Psychologists identified series of factors that predispose

set of elements to be grouped (by human visual system)

Untersuchungen zur Lehre von der Gestalt, Psychologische Forschung, Vol. 4, pp. 301-350, 1923 http://psy.ed.asu.edu/~classics/Wertheimer/Forms/forms.htm

“I stand at the window and see a house, trees, sky. Theoretically I might say there were 327 brightnesses and nuances of colour. Do I have "327"? No. I have sky, house, and trees.”

Max Wertheimer

(1880-1943)

Slide credit: B. Leibe

SLIDE 10

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Gestalt Factors

These factors make intuitive sense, but are very difficult to translate into algorithms.

Image source: Forsyth & Ponce

Slide credit: B. Leibe

SLIDE 11

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Continuity through Occlusion Cues

Slide credit: B. Leibe

SLIDE 12

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Continuity through Occlusion Cues

Continuity, explanation by occlusion

Slide credit: B. Leibe

SLIDE 13

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Continuity through Occlusion Cues

Image source: Forsyth & Ponce

Slide credit: B. Leibe

SLIDE 14

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Continuity through Occlusion Cues

Image source: Forsyth & Ponce

Slide credit: B. Leibe

SLIDE 15

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Figure-Ground Discrimination

Slide credit: B. Leibe

SLIDE 16

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

The Ultimate Gestalt test

Slide adapted from B. Leibe

SLIDE 17

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Image Segmentation

Goal: identify groups of pixels that go together

Slide credit: Steve Seitz, Kristen Grauman

SLIDE 18

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

The Goals of Segmentation

Separate image into objects

Image Human segmentation

Slide credit: Svetlana Lazebnik

SLIDE 19

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

The Goals of Segmentation

Separate image into objects
Group together similar-looking pixels for efficiency of

further processing

X. Ren and J. Malik. Learning a classification model for segmentation. ICCV 2003.

“superpixels”

Slide credit: Svetlana Lazebnik

SLIDE 20

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

SLIDE 21

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

SLIDE 22

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

SLIDE 23

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

SLIDE 24

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

SLIDE 25

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

SLIDE 26

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

SLIDE 27

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

SLIDE 28

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

SLIDE 29

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

SLIDE 30

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Topics of This Lecture

Introduction

Ø Gestalt principles Ø Image segmentation

Segmentation as clustering

Ø k-Means Ø Feature spaces Ø Mixture of Gaussians, EM

Model-free clustering: Mean-Shift
Graph theoretic segmentation: Normalized Cuts
Interactive Segmentation with GraphCuts

SLIDE 31

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Image Segmentation: Toy Example

These intensities define the three groups.
We could label every pixel in the image according to

which of these it is.

Ø i.e. segment the image based on the intensity feature.

What if the image isn’t quite so simple?

intensity input image

black pixels gray pixels white pixels

1 2 3

Slide credit: Kristen Grauman

SLIDE 32

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Pixel count Input image Input image Intensity Pixel count Intensity

Slide credit: Kristen Grauman

SLIDE 33

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Now how to determine the three main intensities that

define our groups?

We need to cluster.

Input image Intensity Pixel count

Slide credit: Kristen Grauman

SLIDE 34

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Goal: choose three “centers” as the representative

intensities, and label every pixel according to which of these centers it is nearest to.

Best cluster centers are those that minimize SSD

between all points and their nearest cluster center ci:

Slide credit: Kristen Grauman

190 255

1 2 3

Intensity

SLIDE 35

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Clustering

With this objective, it is a “chicken and egg” problem:

Ø If we knew the cluster centers, we could allocate points to

groups by assigning each to its closest center.

Ø If we knew the group memberships, we could get the centers by

computing the mean per group.

Slide credit: Kristen Grauman

SLIDE 36

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

K-Means Clustering

Basic idea: randomly initialize the k cluster centers, and

iterate between the two steps we just saw.

1. Randomly initialize the cluster centers, c1, ..., cK
2. Given cluster centers, determine points in each cluster

– For each point p, find the closest ci. Put p into cluster i

3. Given points in each cluster, solve for ci

– Set ci to be the mean of points in cluster i

4. If ci have changed, repeat Step 2
Properties

Ø

Will always converge to some solution

Ø

Can be a “local minimum”

– Does not always find the global minimum of objective function:

Slide credit: Steve Seitz

SLIDE 37

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Segmentation as Clustering

K=2 K=3

img_as_col = double(im(:)); cluster_membs = kmeans(img_as_col, K); labelim = zeros(size(im)); for i=1:k inds = find(cluster_membs==i); meanval = mean(img_as_column(inds)); labelim(inds) = meanval; end Slide credit: Kristen Grauman

SLIDE 38

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

K-Means Clustering

Java demo:

http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html

SLIDE 39

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Feature Space

Depending on what we choose as the feature space, we

can group pixels in different ways.

Grouping pixels based on

intensity similarity

Feature space: intensity value (1D)

Slide credit: Kristen Grauman

SLIDE 40

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Feature Space

Depending on what we choose as the feature space, we

can group pixels in different ways.

Grouping pixels based
n color similarity
Feature space: color value (3D)

R=255 G=200 B=250 R=245 G=220 B=248 R=15 G=189 B=2 R=3 G=12 B=2

R G B

Slide credit: Kristen Grauman

SLIDE 41

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Segmentation as Clustering

Depending on what we choose as the feature space, we

can group pixels in different ways.

Grouping pixels based
n texture similarity
Feature space: filter bank responses (e.g. 24D)

Filter bank

f 24 filters

F24 F2 F1

…

Slide credit: Kristen Grauman

SLIDE 42

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Spatial coherence

Assign a cluster label per pixel à

à possible discontinuities

How can we ensure they

are spatially smooth?

1 2 3

?

Original Labeled by cluster center’s intensity

Slide adapted from Kristen Grauman

SLIDE 43

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Spatial coherence

Depending on what we choose as the feature space, we

can group pixels in different ways.

Grouping pixels based on

intensity+position similarity ⇒ Way to encode both similarity and proximity.

Slide adapted from Kristen Grauman

X Intensity Y

SLIDE 44

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

K-Means without spatial information

K-means clustering based on intensity or color is

essentially vector quantization of the image attributes

Ø Clusters don’t have to be spatially coherent

Image Intensity-based clusters Color-based clusters

Slide adapted from Svetlana Lazebnik

Image source: Forsyth & Ponce

SLIDE 45

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

K-Means with spatial information

K-means clustering based on intensity or color is

essentially vector quantization of the image attributes

Ø Clusters don’t have to be spatially coherent

Clustering based on (r,g,b,x,y) values enforces more

spatial coherence

Slide adapted from Svetlana Lazebnik

Image source: Forsyth & Ponce

SLIDE 46

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Summary K-Means

Pros

Ø Simple, fast to compute Ø Converges to local minimum

f within-cluster squared error
Cons/issues

Ø Setting k? Ø Sensitive to initial centers Ø Sensitive to outliers Ø Detects spherical clusters only Ø Assuming means can be

computed

Slide credit: Kristen Grauman

SLIDE 47

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Probabilistic Clustering

Basic questions

Ø What’s the probability that a point x is in cluster m? Ø What’s the shape of each cluster?

K-means doesn’t answer these questions.
Basic idea

Ø Instead of treating the data as a bunch of points, assume that

they are all generated by sampling a continuous function.

Ø This function is called a generative model. Ø Defined by a vector of parameters θ

Slide credit: Steve Seitz

SLIDE 48

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Mixture of Gaussians

One generative model is a mixture of Gaussians (MoG)

Ø K Gaussian blobs with means µb covariance matrices Vb, dimension d

– Blob b defined by:

Ø Blob b is selected with probability Ø The likelihood of observing x is a weighted mixture of Gaussians

,

Slide adapted from Steve Seitz

SLIDE 49

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Expectation Maximization (EM)

Goal

Ø

Find blob parameters θ that maximize the likelihood function

verall all N datapoints
Approach:

1.

E-step: given current guess of blobs, compute probabilistic ownership

f each point

2.

M-step: given ownership probabilities, update blobs to maximize likelihood function

3.

Repeat until convergence

Slide adapted from Steve Seitz

SLIDE 50

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

EM Details

E-step

Ø Compute probability that point x is in blob b, given current

guess of θ

M-step

Ø Compute overall probability that blob b is selected Ø Mean of blob b Ø Covariance of blob b

(N data points)

Slide adapted from Steve Seitz

SLIDE 51

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Applications of EM

Turns out this is useful for all sorts of problems

Ø Any clustering problem Ø Any model estimation problem Ø Missing data problems Ø Finding outliers Ø Segmentation problems

– Segmentation based on color – Segmentation based on motion – Foreground/background separation

Ø ...

EM demo

Ø http://lcn.epfl.ch/tutorial/english/gaussian/html/index.html

Slide credit: Steve Seitz

SLIDE 52

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Segmentation with EM

Image source: Serge Belongie

k=2 k=3 k=4 k=5

EM segmentation results Original image

Slide credit: B. Leibe

SLIDE 53

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Summary: Mixtures of Gaussians, EM

Pros

Ø Probabilistic interpretation Ø Soft assignments between data points and clusters Ø Generative model, can predict novel data points Ø Relatively compact storage ( )

Cons

Ø Initialization

– often a good idea to start from output of k-means

Ø Local minima Ø Need to know number of components K

– solutions: model selection (AIC, BIC), Dirichlet process mixture

Ø Need to choose generative model (math form of a cluster ?) Ø Numerical problems are often a nuisance

Slide adapted from B. Leibe

SLIDE 54

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Topics of This Lecture

Introduction

Ø Gestalt principles Ø Image segmentation

Segmentation as clustering

Ø k-Means Ø Feature spaces Ø Mixture of Gaussians, EM

Model-free clustering: Mean-Shift
Graph theoretic segmentation: Normalized Cuts
Interactive Segmentation with GraphCuts

SLIDE 55

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Finding Modes in a Histogram

How many modes are there?

Ø Mode = local maximum of a given distribution Ø Easy to see, hard to compute

Slide adapted from Steve Seitz

SLIDE 56

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Mean-Shift Segmentation

An advanced and versatile technique for clustering-

based segmentation

http://coewww.rutgers.edu/riul/FORMER/comanici/MSPAMI/msPamiResults.html

D. Comaniciu and P. Meer, Mean Shift: A Robust Approach toward Feature Space Analysis,

PAMI 2002.

Slide credit: Svetlana Lazebnik

SLIDE 57

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Mean-Shift Algorithm

Iterative Mode Search

1.

Initialize random seed center and window W

2.

Calculate center of gravity (the “mean”) of W:

3.

Shift the search window to the mean

4.

Repeat steps 2+3 until convergence

Slide adapted from Steve Seitz

SLIDE 58

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass Mean Shift vector

Mean-Shift

Slide by Y . Ukrainitz & B. Sarel

SLIDE 59

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass Mean Shift vector

Mean-Shift

Slide by Y . Ukrainitz & B. Sarel

SLIDE 60

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass Mean Shift vector

Mean-Shift

Slide by Y . Ukrainitz & B. Sarel

SLIDE 61

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass Mean Shift vector

Mean-Shift

Slide by Y . Ukrainitz & B. Sarel

SLIDE 62

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass Mean Shift vector

Mean-Shift

Slide by Y . Ukrainitz & B. Sarel

SLIDE 63

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass Mean Shift vector

Mean-Shift

Slide by Y . Ukrainitz & B. Sarel

SLIDE 64

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Region of interest Center of mass

Mean-Shift

Slide by Y . Ukrainitz & B. Sarel

SLIDE 65

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Tessellate the space with windows Run the procedure in parallel

Slide by Y . Ukrainitz & B. Sarel

Real Modality Analysis

SLIDE 66

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

The blue data points were traversed by the windows towards the mode.

Slide by Y . Ukrainitz & B. Sarel

Real Modality Analysis

SLIDE 67

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Mean-Shift Clustering

Cluster: all data points in the attraction basin of a mode
Attraction basin: the region for which all trajectories

lead to the same mode

Slide by Y . Ukrainitz & B. Sarel

SLIDE 68

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Mean-Shift Clustering/Segmentation

Choose features (color, gradients, texture, etc)
Initialize windows at individual pixel locations
Start mean-shift from each window until convergence
Merge windows that end up near the same “peak” or

mode

Slide adapted from Svetlana Lazebnik

SLIDE 69

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Mean-Shift Segmentation Results

http://coewww.rutgers.edu/riul/FORMER/comanici/MSPAMI/msPamiResults.html

Slide credit: Svetlana Lazebnik

SLIDE 70

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

More Results

Slide credit: Svetlana Lazebnik

SLIDE 71

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Summary Mean-Shift

Pros

Ø General, application-independent tool Ø Model-free, does not assume any prior shape (spherical,

elliptical, etc.) on data clusters

Ø Just a single parameter (window size h)

– h has a physical meaning (unlike k-means) == scale of clustering

Ø Finds variable number of modes given the same h Ø Robust to outliers

Cons

Ø Output depends on window size h Ø Window size (bandwidth) selection is not trivial Ø Computationally rather expensive Ø Does not scale well with dimension of feature space

Slide adapted from Svetlana Lazebnik

SLIDE 72

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Topics of This Lecture

Introduction

Ø Gestalt principles Ø Image segmentation

Segmentation as clustering

Ø k-Means Ø Feature spaces Ø Mixture of Gaussians, EM

Model-free clustering: Mean-Shift
Graph theoretic segmentation: Normalized Cuts
Interactive Segmentation with GraphCuts

SLIDE 73

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Images as Graphs

Fully-connected graph

Ø Node (vertex) for every pixel Ø Edge between every pair of pixels (p,q) Ø Affinity weight wpq for each edge

– wpq measures similarity – Similarity is inversely proportional to difference (in color, texture, position, …)

q p wpq

w

Slide adapted from Steve Seitz

SLIDE 74

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Segmentation by Graph Cuts

Break Graph into Segments

Ø Delete edges crossing between segments Ø Easiest to break edges with low similarity (low weight)

– Similar pixels should be in the same segments – Dissimilar pixels should be in different segments

w A B C

Slide adapted from Steve Seitz

SLIDE 75

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Measuring Affinity

Distance
Intensity
Color
Texture

{ }

2

2 1 2

( , ) exp

d

aff x y x y

σ

= − −

{ }

2

2 1 2

( , ) exp ( ) ( )

d

aff x y I x I y

σ

= − −

(some suitable color space distance)

( )

{ }

2

2 1 2

( , ) exp ( ), ( )

d

aff x y dist c x c y

σ

= −

Source: Forsyth & Ponce

{ }

2

2 1 2

( , ) exp ( ) ( )

d

aff x y f x f y

σ

= − −

(vectors of filter outputs)

SLIDE 76

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Scale Affects Affinity

Small σ: group only nearby points
Large σ: group far-away points

Slide adapted from Svetlana Lazebnik

Small σ Medium σ Large σ

Image Source: Forsyth & Ponce

small σ large σ data points

SLIDE 77

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Graph Cut (GC)

GC = edges whose removal partitions a graph in two
Cost of a cut

Ø Sum of weights of cut edges:

A graph cut gives us a segmentation

Ø What is a “good” graph cut and how do we find one?

Slide adapted from Steve Seitz

A B

∑

∈ ∈

=

B q A p q p

w B A cut

, ,

) , (

SLIDE 78

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Graph Cut

Image Source: Forsyth & Ponce

Here, the cut is nicely defined by the block-diagonal structure of the affinity matrix. ⇒ How can this be generalized?

Slide credit: B. Leibe

SLIDE 79

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Minimum Cut

We can do segmentation by finding the minimum cut in

a graph

Ø

Efficient algorithms exist for doing this

Drawback:

Ø

Weight of cut proportional to number of edges in the cut

Ø

Minimum cut tends to cut off very small, isolated components Ideal Cut Cuts with lesser weight than the ideal cut

Slide credit: Khurram Hassan-Shafique

SLIDE 80

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Normalized Cut (NCut)

Min-cut has bias toward partitioning out small segments
This can be fixed by normalizing for size of segments
The normalized cut cost is:
The exact solution is NP-hard but an approximation can

be computed by solving a generalized eigenvalue problem.

assoc(A,V) = sum of weights from A to all nodes to graph

cut(A,B) assoc(A,V) + cut(A,B) assoc(B,V)

J. Shi and J. Malik. Normalized cuts and image segmentation. PAMI 2000

Slide adapted from Svetlana Lazebnik

SLIDE 81

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Interpretation as a Dynamical System

Treat the edges as springs and ‘shake’ the system

Ø Elasticity proportional to cost Ø Vibration “modes” correspond to segments

– Can compute these by solving a generalized eigenvector problem

Slide adapted from Steve Seitz

SLIDE 82

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

NCuts as a Generalized Eigenvector Problem

Definitions
Rewriting Normalized Cut in matrix form:

,

: ( , ) ; : ( , ) ( , ); : {1, 1} , ( ) 1 . the affinity matrix, the diag. matrix, a vector in

i j j N

W W i j w D D i i W i j x x i i A = = − = ⇔ ∈

∑

Slide credit: Jitendra Malik

(A,B) (A,B) (A,B) (A,V) (B,V) ( , ) (1 ) ( )(1 ) (1 ) ( )(1 ) ; 1 1 (1 )1 1 ( , ) ...

i

T T x T T i

cut cut NCut assoc assoc D i i x D W x x D W x k k D k D D i i

>

= + + − + − − − = + = − =

∑ ∑

SLIDE 83

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Some More Math…

Slide credit: Jitendra Malik

SLIDE 84

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

NCuts as a Generalized Eigenvalue Problem

After simplification, we get
This is a Rayleigh Quotient

Ø Solution given by the “generalized” eigenvalue problem

Ø Solved by converting to standard eigenvalue problem

Subtleties

Ø Smallest eigenvector is with eigenvalue 0 (and ) Ø Optimal solution is second smallest eigenvector Ø Gives continuous result—must convert into discrete values of y

( ) ( , ) , with {1, }, 1 0.

T T i T

y D W y NCut A B y b y D y Dy − = ∈ − =

Dy y W D λ = − ) (

Slide adapted from Alyosha Efros

This is hard, as y is discrete! Relaxation: y is continuous.

with

SLIDE 85

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

NCuts Example

Smallest eigenvectors

Image source: Shi & Malik

NCuts segments

Slide credit: B. Leibe

SLIDE 86

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

NCuts: Overall Procedure

1. Construct a weighted graph G=(V,E) from an image.
2. Connect each pair of pixels, and assign graph edge

weights,

3. Solve for the smallest few
eigenvectors. This yields a continuous solution.
4. Threshold eigenvectors to get a discrete cut

Ø This is where the approximation is made (we’re not solving NP).

5. Recursively subdivide if NCut value is below a pre-

specified value.

( ) D W y Dy λ − = ( , )

Prob. that and belong to the same region.

W i j i j =

Slide credit: Jitendra Malik

NCuts Matlab code available at http://www.cis.upenn.edu/~jshi/software/

SLIDE 87

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Color Image Segmentation with NCuts

Image Source: Shi & Malik

Slide credit: Steve Seitz

SLIDE 88

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Results with Color & Texture

SLIDE 89

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Summary: Normalized Cuts

Pros:

Ø Generic framework, flexible to choice of function that computes

weights (“affinities”) between nodes

Ø Does not require any model of the data distribution

Cons:

Ø Time and memory complexity can be high

– Dense, highly connected graphs ⇒ many affinity computations – Solving eigenvalue problem

Ø Preference for balanced partitions

– If a region is uniform, NCuts will find the modes of vibration of the image dimensions

Slide credit: Kristen Grauman

SLIDE 90

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Segmentation: Caveats

We’ve looked at bottom-up ways to segment an image

into regions, yet finding meaningful segments is intertwined with the recognition problem.

Often want to avoid making hard decisions too soon
Difficult to evaluate; when is a segmentation successful?

Slide credit: Kristen Grauman

SLIDE 91

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Topics of This Lecture

Introduction

Ø Gestalt principles Ø Image segmentation

Segmentation as clustering

Ø k-Means Ø Feature spaces Ø Mixture of Gaussians, EM

Model-free clustering: Mean-Shift
Graph theoretic segmentation: Normalized Cuts
Interactive Segmentation with GraphCuts

SLIDE 92

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Markov Random Fields

Allow rich probabilistic models for images
But built in a local, modular way

Ø Learn local effects, get global effects out

Slide credit: William Freeman

Observed evidence Hidden “true states” Neighborhood relations

SLIDE 93

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

MRF Nodes as Pixels (or Patches)

Image Image pixels states (e.g. foreground/background)

Slide adapted from William Freeman

( , )

i i

x y Φ

( , )

i j

x x Ψ

SLIDE 94

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Network Joint Probability

,

( , ) ( , ) ( , )

i i i j i i j

P x y x y x x = Φ Ψ

∏ ∏

states Image

Slide adapted from William Freeman

Image-state compatibility function state-state compatibility function Neighboring nodes Local

bservations

SLIDE 95

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Energy Formulation

Joint probability
Maximizing the joint probability is the same as

minimizing the log

This is similar to free-energy problems in statistical

mechanics (spin glass theory). We therefore draw the analogy and call E an energy function.

ϕ and ψ are called potentials.

,

( , ) ( , ) ( , )

i i i j i i j

P x y x y x x = Φ Ψ

∏ ∏

, ,

log ( , ) log ( , ) log ( , ) ( , ) ( , ) ( , )

i i i j i i j i i i j i i j

P x y x y x x E x y x y x x ϕ ψ = Φ + Ψ = +

∑ ∑ ∑ ∑

Slide credit: B. Leibe

SLIDE 96

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Energy Formulation

Energy function
Unary potentials ϕ

Ø Encode local information about the given pixel/patch Ø How likely is a pixel/patch to be in a certain state ?

(e.g. foreground/background)?

Pairwise potentials ψ

Ø Encode neighborhood information Ø How different is a pixel/patch’s label from that of its neighbor?

(e.g. here independent of image data, but later based on intensity/color/texture difference) Pairwise potentials Unary potentials

( , )

i i

x y ϕ ( , )

i j

x x ψ

,

( , ) ( , ) ( , )

i i i j i i j

E x y x y x x ϕ ψ = +

∑ ∑

Slide adapted from B. Leibe

SLIDE 97

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Energy Minimization

Goal:

Ø Infer the optimal labeling of the MRF.

Many inference algorithms are available, e.g.

Ø Gibbs sampling, simulated annealing Ø Iterated conditional modes (ICM) Ø Variational methods Ø Belief propagation Ø Graph cuts

Recently, Graph Cuts have become a popular tool

Ø Only suitable for a certain class of energy functions Ø But the solution can be obtained very fast for typical vision

problems (~1MPixel/sec).

( , )

i i

x y ϕ ( , )

i j

x x ψ

Slide credit: B. Leibe

SLIDE 98

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Graph Cuts for Optimal Boundary Detection

Idea: convert MRF into source-sink graph

n-links s t a cut

hard constraint hard constraint

Minimum cost cut can be computed in polynomial time

(max-flow/min-cut algorithms)

[Boykov & Jolly, ICCV’01] Slide adapted from Yuri Boykov

SLIDE 99

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Simple Example of Energy

∑ ∑

∈

≠ ⋅ + =

N pq q p pq p p p

L L w L D L E ) ( ) ( ) ( δ

} , { t s Lp ∈

t-links n-links

Boundary term Regional term (binary segmentation)

Slide credit: Yuri Boykov

⎭ ⎬ ⎫ ⎩ ⎨ ⎧ Δ − =

2

2 exp σ

pq pq

I w

pq

I Δ

σ

s t a cut

) (s Dp ) (t Dp

SLIDE 100

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Adding Regional Properties

pq

w

n-links s t a cut

) (t Dp

t-link

) (s Dp

t-link

NOTE: hard constrains are not required, in general.

Regional bias example

Suppose are given “expected” intensities

f object and background

t s

I I and

( )

2 2 2

/ || || exp ) ( σ

s p p

I I s D − − ∝

( )

2 2 2

/ || || exp ) ( σ

t p p

I I t D − − ∝

[Boykov & Jolly, ICCV’01] Slide credit: Yuri Boykov

SLIDE 101

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Adding Regional Properties

pq

w

n-links s t a cut

) (t Dp

t-link

) (s Dp

t-link

( )

2 2 2

/ || || exp ) ( σ

s p p

I I s D − − ∝

( )

2 2 2

/ || || exp ) ( σ

t p p

I I t D − − ∝

EM-style optimization “expected” intensities of

bject and background

can be re-estimated

t s

I I and

[Boykov & Jolly, ICCV’01] Slide credit: Yuri Boykov

SLIDE 102

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Adding Regional Properties

More generally, regional bias can be based on any

intensity models of object and background

a cut

( ) logPr( | )

p p p p

D L I L = −

given object and background intensity histograms

) (s Dp ) (t Dp

s t

I

) | Pr( s I p ) | Pr( t I p

p

I

[Boykov & Jolly, ICCV’01] Slide credit: Yuri Boykov

SLIDE 103

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

How to Set the Potentials? Some Examples

Color potentials

Ø e.g. modeled with a Mixture of Gaussians

Edge potentials

Ø e.g. a “contrast sensitive Potts model”

where

Parameters θπ, θφ need to be learned, too!

[Shotton & Winn, ECCV’06]

( , , ( ); ) ( ) ( )

T i j ij ij i j

x x g y g y x x

φ φ

φ θ θ δ = − ≠

( )

2

i j

avg y y β = ⋅ −

2

( )

i j

y y ij

g y e

β − −

= ( , ; ) log ( , ) ( | ) ( ; , )

i i i i i k k k

x y x k P k x N y y

π π

π θ θ = Σ

∑

Slide credit: B. Leibe

SLIDE 104

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

How Does it Work? The s-t-Mincut Problem

Source Sink v1 v2

2 5 9 4 2 1 Graph (V, E, C)

Vertices V = {v1, v2 ... vn} Edges E = {(v1, v2) ....} Costs C = {c(1, 2) ....}

Slide credit: Pushmeet Kohli

SLIDE 105

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

The s-t-Mincut Problem

Source Sink v1 v2

2 5 9 4 2 1

Slide credit: Pushmeet Kohli

What is an st-cut? What is the cost of a st-cut?

An st-cut (S,T) divides the nodes between source and sink. Sum of cost of all edges going from S to T

5 + 2 + 9 = 16

SLIDE 106

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

The s-t-Mincut Problem

Source Sink v1 v2

2 5 9 4 2 1

Slide credit: Pushmeet Kohli

What is an st-cut? What is the cost of a st-cut?

An st-cut (S,T) divides the nodes between source and sink. Sum of cost of all edges going from S to T st-cut with the minimum cost

What is the st-mincut?

2 + 1 + 4 = 7

SLIDE 107

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

History of Maxflow Algorithms

Augmenting Path and Push-Relabel

n: #nodes m: #edges U: maximum

edge weight Algorithms assume non- negative edge weights

Slide credit: Andrew Goldberg

SLIDE 108

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

How to Compute the s-t-Mincut?

Source Sink v1 v2

2 5 9 4 2 1 Solve the dual maximum flow problem

In every network, the maximum flow equals the cost of the st-mincut

Min-cut/Max-flow Theorem Compute the maximum flow between Source and Sink

Constraints Edges: Flow < Capacity Nodes: Flow in = Flow out

Slide credit: Pushmeet Kohli

SLIDE 109

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

2 5 9 4 2 1

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 0

SLIDE 110

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

9 4 2 1

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 0 2 5

SLIDE 111

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

9 4 2 1

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 0 + 2 5-2 2-2

SLIDE 112

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

9 4 2 1

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 2 3

SLIDE 113

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

3 9 4 2 1

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 2

SLIDE 114

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

3 2 1

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 2 9 4

SLIDE 115

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

3 2 1

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 2 + 4 5

SLIDE 116

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

3 5 2 1

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 6

SLIDE 117

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

2

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 6 3 5 1

SLIDE 118

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

2

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 6 + 1 2 4 1-1

SLIDE 119

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

3 5 2

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 7

SLIDE 120

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow Algorithms

Source Sink v1 v2

3 5 2

Slide credit: Pushmeet Kohli

Augmenting Path Based Algorithms

1. Find path from source to sink

with positive capacity

2. Push maximum possible flow

through this path

3. Repeat until no path can be

found Algorithms assume non-negative capacity Flow = 7

SLIDE 121

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Maxflow in Computer Vision

Specialized algorithms for vision

problems

Ø Grid graphs Ø Low connectivity (m ~ O(n))

Dual search tree augmenting path

algorithm

[Boykov and Kolmogorov PAMI 2004]

Ø Finds approximate shortest augmenting

paths efficiently

Ø High worst-case time complexity Ø Empirically outperforms other

algorithms on vision problems

Ø Efficient code available on the web

http://www.adastral.ucl.ac.uk/~vladkolm/software.html

Slide credit: Pushmeet Kohli

SLIDE 122

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

When Can s-t Graph Cuts Be Applied?

s-t graph cuts can only globally minimize binary energies

that are submodular.

Non-submodular cases can still be addressed with some
ptimality guarantees.

Ø Current research topic

∑ ∑

∈

+ =

N pq q p p p p

L L E L E L E ) , ( ) ( ) (

} , { t s Lp ∈

t-links n-links

Boundary term Regional term E(L) can be minimized by s-t graph cuts

) , ( ) , ( ) , ( ) , ( s t E t s E t t E s s E + ≤ +

⇔

Submodularity (“convexity”)

[Boros & Hummer, 2002, Kolmogorov & Zabih, 2004] Slide credit: B. Leibe

SLIDE 123

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Dealing with Non-Binary Cases

For image segmentation, the limitation to binary

energies is a nuisance.

⇒ Binary segmentation only

We would like to solve also multi-label problems.

Ø NP-hard problem with 3 or more labels

There exist some approximation algorithms which

extend graph cuts to the multi-label case

Ø α-Expansion Ø αβ

αβ-Swap

They are no longer guaranteed to return the globally
ptimal result.

Ø But α-Expansion has a guaranteed approximation quality (2-

approx) and converges in a few iterations.

Slide credit: B. Leibe

SLIDE 124

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

α-Expansion Move

Basic idea:

Ø Break multi-way cut computation into a sequence of

binary s-t cuts.

ther labels

α

Slide credit: Yuri Boykov

SLIDE 125

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

α-Expansion Algorithm

1. Start with any initial solution
2. For each label “α” in any order

1.

Compute optimal α-expansion move (s-t graph cuts): set the label of each node to alpha or leave to current label (so s = alpha and t = current)

2.

Decline the move if there is no energy decrease

3. iterate to 2.
Stop when no expansion move would decrease energy
why good ? à

à each move is optimal within a very large set of possible segmentations (2N)

Slide adapted from Yuri Boykov

SLIDE 126

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

α-Expansion Moves

In each a-expansion a given label “α” grabs space from
ther labels

initial solution

expansion
expansion
expansion
expansion
expansion
expansion
expansion

For each move we choose the expansion that gives the largest decrease in the energy: binary optimization problem

Slide credit: Yuri Boykov

SLIDE 127

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

GraphCut Applications: “GrabCut”

User segmentation cues Additional segmentation cues

Interactive Image Segmentation [Boykov & Jolly, ICCV’01]

Ø Rough region cues sufficient Ø Segmentation boundary can be extracted from edges

Procedure

Ø User marks foreground and background regions with a brush à

à get initial segmentation à à correct by additional brush strokes

Slide adapted from Matthieu Bray

SLIDE 128

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Obtained from interactive user input

Ø User marks foreground and background regions with a brush Ø Alternatively, user can specify a bounding box

GrabCut: Data Model

Global optimum of the unary energy Background color Foreground color

Slide adapted from Carsten Rother

SLIDE 129

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

GrabCut: Coherence Model

An object is a coherent set of pixels:

How to choose γ ?

Slide credit: Carsten Rother Error (%) over training set: 25

[ ]

2

( , )

( , ) e

m n

y y n m m n C

x y x x

β

ψ γ δ

− − ∈

= ≠

∑

SLIDE 130

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Iterated Graph Cuts

Energy after each iteration Result

Foreground & Background Background G

R

Foreground Background G

R

1 2 3 4

Color model (Mixture of Gaussians)

Slide credit: Carsten Rother

SLIDE 131

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

GrabCut: Example Results

SLIDE 132

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Improving Efficiency of Segmentation

Problem: Images contain many pixels

Ø Even with efficient graph cuts, an MRF

formulation has too many nodes for interactive results.

Efficiency trick: Superpixels

Ø Group together similar-looking

pixels for efficiency of further processing.

Ø Cheap, local oversegmentation Ø Important to ensure that superpixels

do not cross boundaries

Several different approaches possible

Ø Superpixel code available here

Ø http://www.cs.sfu.ca/~mori/research/superpixels/

Image source: Greg Mori

Slide credit: B. Leibe

SLIDE 133

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Superpixels for Pre-Segmentation

Speedup Graph structure

Slide credit: B. Leibe

SLIDE 134

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09

Summary: Graph Cuts Segmentation

Pros

Ø Powerful technique, based on probabilistic model (MRF). Ø Applicable for a wide range of problems. Ø Very efficient algorithms available for vision problems. Ø Becoming a de-facto standard for many segmentation tasks.

Cons/Issues

Ø Graph cuts can only solve a limited class of models

– Submodular energy functions – Can capture only part of the expressiveness of MRFs

Ø Only approximate algorithms available for multi-label case

Slide credit: B. Leibe