Unsupervised Image Segmentation Using Comparative Reasoning and - - PowerPoint PPT Presentation

unsupervised image segmentation using comparative
SMART_READER_LITE
LIVE PREVIEW

Unsupervised Image Segmentation Using Comparative Reasoning and - - PowerPoint PPT Presentation

Unsupervised Image Segmentation Using Comparative Reasoning and Random Walks Anuva Kulkarni Carnegie Mellon University Filipe Condessa Carnegie Mellon, IST-Lisbon Jelena Kovacevic Carnegie Mellon University 1 Outline Motivation


slide-1
SLIDE 1

Unsupervised Image Segmentation Using Comparative Reasoning and Random Walks

Anuva Kulkarni Carnegie Mellon University Filipe Condessa Carnegie Mellon, IST-Lisbon Jelena Kovacevic Carnegie Mellon University

1

slide-2
SLIDE 2

Outline

  • Motivation

– Training-free methods – Hashing – Related work

  • Approach

– Winner Take All (WTA) Hash – Clustering based on Random Walks

  • Some experimental results

2

slide-3
SLIDE 3

Motivation

  • Goals:

– Segment images where no. of classes unknown) – Eliminate training-data (may not be available) – Fast computation as a pre-processing step for classification

  • Segmentation is similarity-search
  • Machine learning concept of “hashing” data for

fast similarity-search

3

slide-4
SLIDE 4

Hashing

  • Used to speed up the searching process
  • A ‘hash function’ relates the data values to

keys or ‘hash codes’

  • Hash table : shortened representation of data

0111 Value Key/ Hash code Hash function

Hash value Data 001 010 011 100 Bird_type1 Bird_type2 Dog_type1 Fox_type1 Hash table

4

slide-5
SLIDE 5

Hashing

  • Similar data points have the same (or close

by) hash values

  • Hash function:

– Always returns a number for an object – Two equal objects will always have the same number – Two unequal objects may not always have different numbers

Hash code Input data

5

slide-6
SLIDE 6

Hashing for Segmentation

  • Each pixel is described by some feature

vectors (eg. Color)

  • Hashing is used to cluster them into groups

1110 0110 0111 0001 Color features

  • f each pixel

computed Image Similar features hashed into same groups

6

slide-7
SLIDE 7

Segmentation and Randomized Hashing

  • Used by Taylor and Cowley (2009) for image

segmentation

  • Algorithm:

– Hash the features of each pixel into n-bit codes – Find local maxima in the space of hash codes. These are ”cluster centers” – Assign feature vector to closest maxima à get clusters – Use a connected components algorithm

  • Parallelizable
  • C. J. Taylor and A. Cowley, “Fast segmentation via randomized hashing.,” in BMVC, pp. 1–11, 2009.

7

slide-8
SLIDE 8

Segmentation and Randomized Hashing

  • Random hashing i.e using a hash code to

indicate the region in which a feature vector lies after splitting the space using a set of randomly chosen splitting planes

  • C. J. Taylor and A. Cowley, “Fast segmentation via randomized hashing.,” in BMVC, pp. 1–11, 2009.

8

2 3 1 1001 1011 1111 0111 0001 0000 1000 0110 0011 0100

slide-9
SLIDE 9

Winner Take All Hash

  • A way to convert feature vectors into compact

binary hash codes

  • Rank correlation is preserved
  • Absolute value of feature does not matter;
  • nly the ordering of values matters
  • Distance between hashes approximates rank

correlation (?)

  • C. J. Taylor and A. Cowley, “Fast segmentation via randomized hashing.,” in BMVC, pp. 1–11, 2009.

9

slide-10
SLIDE 10

Calculating WTA Hash

  • Consider 3 feature vectors
  • Step 1: Create random permutations

10

13 4 2 11 5 3 3 1 5 2 6 4 2 13 5 4 3 11 44 1 15 90 6 5 12 5 3 10 4 2

feature 1 feature 2 feature 3

1 90 44 5 15 6 3 12 4 5 2 10

Permute with θ Permutation vector θ Step 1

slide-11
SLIDE 11

Calculating WTA Hash

  • Step 2: Choose first K entries. Let K=3

11

13 4 2 11 5 3 3 1 5 2 6 4 2 13 5 4 3 11 44 1 15 90 6 5 12 5 3 10 4 2

feature 1 feature 2 feature 3

1 90 44 5 15 6 3 12 4 5 2 10

Permute with θ

2 13 5 4 3 11 44 1 15 90 6 5 3 12 4 5 2 10

Choose first K entries Permutation vector θ Step 1 Step 2

slide-12
SLIDE 12

Calculating WTA Hash

  • Step 3: Pick the index of the max. entry. This is

the hash code ‘h’ of that feature vector

12

13 4 2 11 5 3 3 1 5 2 6 4 2 13 5 4 3 11 44 1 15 90 6 5 12 5 3 10 4 2

feature 1 feature 2 feature 3

1 90 44 5 15 6 3 12 4 5 2 10

Permute with θ

2 13 5 4 3 11 44 1 15 90 6 5 3 12 4 5 2 10

Choose first K entries

2 13 5 4 3 11 44 1 15 90 6 5 3 12 4 5 2 10

Hash code is index

  • f top entry out of the K

h=2 h=2 h=1 Permutation vector θ Step 1 Step 2 Step 3

slide-13
SLIDE 13

Calculating WTA Hash

Notice that Feature 2 is just Feature 1 perturbed by one, but Feature 3 is very different

13

13 4 2 11 5 3 3 1 5 2 6 4 2 13 5 4 3 11 44 1 15 90 6 5 12 5 3 10 4 2

feature 1 feature 2 feature 3

1 90 44 5 15 6 3 12 4 5 2 10

Permute with θ

2 13 5 4 3 11 44 1 15 90 6 5 3 12 4 5 2 10

Choose first K entries

2 13 5 4 3 11 44 1 15 90 6 5 3 12 4 5 2 10

Hash code is index

  • f top entry out of the K

h=2 h=2 h=1 Permutation vector θ Feature 1 and Feature 2 are similar Step 1 Step 2 Step 3

slide-14
SLIDE 14

Random Walks

  • Understanding proximity in graphs
  • Useful in propagation in graphs
  • Similar to electrical network with voltages and

edge weights inversely proportional to resistances

14

2 2 2 1 1 1 1 +1V

  • 1V
  • 0.16V

0.05V 0.16V

slide-15
SLIDE 15

Calculating WTA Hash

  • Consider a feature

vector

  • Step 1: Create P=4

random permutations

15

12 5 1 33 7 15 7 1 5 33 12 15 33 7 15 12 5 1 5 12 7 1 15 33 7 15 12 1 33 5

4 random permutations

slide-16
SLIDE 16

Calculating WTA Hash

  • Step 2: Pick first K

entries of the permuted vectors

  • K=3

16

Pick first K entries

7 1 5 33 12 15 33 7 15 12 5 1 5 12 7 1 15 33 7 15 12 1 33 5

Pick first K entries K=3

12 5 1 33 7 15

slide-17
SLIDE 17

Calculating WTA Hash

  • Step 3: Index of the

maximum element is the hash code

  • Thus a binary code is

associated with our feature vector

17

12 5 1 33 7 15

maximum out of the K entries

7 1 5 33 12 15 33 7 15 12 5 1 5 12 7 1 15 33 7 15 12 1 33 5

h=01 h=01 h=10 h=10

slide-18
SLIDE 18

Calculating WTA Hash

  • Step 4: Bin features

according to the similarity in their hash codes

  • MinHash is a special

case of WTA Hash

18

12 5 1 33 7 15

maximum out of the K entries

7 1 5 33 12 15 33 7 15 12 5 1 5 12 7 1 15 33 7 15 12 1 33 5

h=01 h=01 h=10 h=10

slide-19
SLIDE 19

Our Approach

  • 1. Similarity Search using WTA Hash
  • 2. Transformation to graph with nodes and

edges

  • 3. Probability map using Random Walks

– Automatic seed selection

  • 4. Clustering

Input image Segmented

  • utput

Random projections WTA hash

Transform to

graph with (Nodes, Edges)

  • Auto. seed

selection

Stop?

Probabilities from RW algo. Yes No

Similarity Search RW Algorithm Block I Block II Block III

19

slide-20
SLIDE 20

Block I: WTA hash

  • Image Dimensions: P x Q x d
  • Project onto R randomly chosen hyperplanes

– Each point in image has R feature vectors

20

Image = R PQ P Q d PQ d

vectorize Random projections

  • nto R pairs of points
slide-21
SLIDE 21

Block I: WTA hash

  • Run WTA hash N times.

21

Image = R PQ Each point has R features PQ P Q d PQ d

vectorize Random projections

  • nto R pairs of points

01 11

Run WTA hash. W for each point in the image K=3 Hence possible values

  • f hash codes

are 00, 01, 11 Repeat this N times to get PQ x N matrix of hash codes PQ N

slide-22
SLIDE 22

Block II: Create Graph

  • Run WTA hash N times à each point has N

hash codes

  • Image transformed into lattice
  • Edge weights:

Where:

22

wi, j = exp(−βvi, j)

vi, j = dH (i, j) γ

dH (i, j) = avg. Hamming distance over all N hash codes of nodes i and j γ = Scaling factor β = Weight parameter for RW algorithm

slide-23
SLIDE 23

Block III: Random Walks

  • Needs initial seeds to be defined
  • Unsupervised draws using Dirichlet processes
  • DP(G0,α)

– Go is base distribution – α is concentration parameter

  • DP draws values around G0. Samples are less

concentrated as α é

23

=1

!|!!!!, ! = Total!numbe !

!!, ! =

! !

=10

Total!numbe ss!label!!, ! |!! ≠ !!} mber!of!sa !|!!!!, ! = Total!numbe !

!!, ! =

=100

! ! = !| Total!numbe ss!label!!, ! |!! ≠ !!} mber!of!sa !|!!!!, ! = Total!numbe

slide-24
SLIDE 24

Block III: Random Walks

  • Draw seeds from Dirichlet process DP(G,α)

with base distribution G0

  • X1, … Xn-1 are samples drawn from the

Dirichlet process

  • Behaviour of the next sample Xn given the

previous samples is:

24

Xn | X1,...Xn−1 = Xi with prob. 1 n −1+α New draw from G0 with prob. α n −1+α " # $ $ $ % $ $ $

slide-25
SLIDE 25

Block III: Random Walks

  • Probability that a new seed belongs to a new

class is proportional to α

  • Posterior probability for the ith sample with

class label yi :

25

! !! = !|!!!!, ! =

!!

!!! ! !!"!

!!!!!

where !!"! = Total!number!of!classes ! = Class!label!!, ! ∈ 1,2 … !

! !! = !|!!!!, ! =

!! !!!"! !!!!!

where !!"! = Total!number!of!classes !! = Class!label!!, ! ∈ 1,2 … !!"! !!! = {!!|!! ≠ !!} !!

!! = number!of!samples!in!!th!class!excluding!the!!th!sample

slide-26
SLIDE 26

Block III: Random Walks

  • Unsupervised, hence Ctot is infinite. Hence,
  • “Clustering effect” or “rich gets richer”
  • Probability that a new class is discovered:

26

lim!!"!→! ! !! = !|!!!!, ! =

!!

!!

!!!!!,!!!!!!∀!, !! !! > 0

lim

! →!

! !! = !|!!!!, ! = ! ! − 1 + ! ,

Class is non-empty

! ! !! = !|!!!!, ! = !! !!!!!,!!!!!!∀!, !! !! > 0

lim

!!"!→!

! !! = !|!!!!, !

!

= ! ! − 1 + ! , ∀!, !!!

!! = 0

! , …!!!!!!are!samples!drawn!from!a!Dirichlet!process!!!with!parameter!α

Class is empty or new

slide-27
SLIDE 27

Block III: Random Walks

  • Use the RW algorithm to generate c

probability maps, c= Number of classes found so far.

  • Entropy calculated with probability maps
  • Entropy-based stopping criteria

– Cluster purity é, Avg. image entropy ê

27

slide-28
SLIDE 28

Experimental Results

28

Automatically Picked seeds

Berkeley segmentation subset

  • Avg. accuracy = 91.42% ± 4.57

Histology images

slide-29
SLIDE 29

Experimental Results

29

TexGeo

  • Avg. accuracy = 95.14%± 2.97

TexBTF

  • Avg. accuracy= 98.36% ± 0.78
slide-30
SLIDE 30

Experimental Results

  • Comparison measure: Global Consistency

Error (GCE)*

– Lower GCE indicates lower error

30

*C. Fowlkes, D. Martin, and J. Malik, “Learning affinity functions for image segmentation: Combining patch-based and gradient-based approaches,” vol. 2, pp. II–54, IEEE, 2003.

Value Of R GCE Score BSDSubset TexBTF TexColor TexGeo 10 0.179 0.063 0.159 0.102 20 0.180 0.065 0.159 0.129 40 0.186 0.061 0.156 0.134

slide-31
SLIDE 31

Experimental Results

  • Comparison measure: Global Consistency

Error (GCE)

– Lower GCE indicates lower error

  • Comparison with other methods**:

31

Value Of R GCE Score BSDSubset TexBTF TexColor TexGeo 10 0.179 0.063 0.159 0.102 20 0.180 0.065 0.159 0.129 40 0.186 0.061 0.156 0.134

Method Human RAD Seed

Learned Affinity Mean Shift Normalized cuts

GCE 0.080 0.205 0.209 0.214 0.260 0.336 **E. Vazquez, J. Van De Weijer, and R. Baldrich, “Image segmentation in the presence of shadows and highlights,”

  • pp. 1–14, Springer, 2008.
slide-32
SLIDE 32

Conclusion

  • WTA enables fast similarity search
  • Parallelizable
  • Completely unsupervised Random Walks-based clustering
  • Can be used as pre-processing step in classification for images where

number of classes is unknown

32