Unsupervised Discovery Of Mid-level Discriminative Patches Saurabh - - PowerPoint PPT Presentation

unsupervised discovery of mid level discriminative patches
SMART_READER_LITE
LIVE PREVIEW

Unsupervised Discovery Of Mid-level Discriminative Patches Saurabh - - PowerPoint PPT Presentation

Unsupervised Discovery Of Mid-level Discriminative Patches Saurabh Singh (ss1@andrew.cmu.edu), RI Which representation seems intuitive? Spectrum of Visual Features Low-Level High-Level Pixel Filter-Banks Sparse-SIFT Parts, Objects


slide-1
SLIDE 1

Unsupervised Discovery Of Mid-level Discriminative Patches

Saurabh Singh (ss1@andrew.cmu.edu), RI

slide-2
SLIDE 2

Which representation seems intuitive?

slide-3
SLIDE 3

Spectrum of Visual Features

Low-Level High-Level Pixel Image Objects Filter-Banks Sparse-SIFT Visual Words Parts, Segments

slide-4
SLIDE 4

Visual Words or Letters?

slide-5
SLIDE 5

Spectrum of Visual Features

Low-Level High-Level Pixel Image Objects Filter-Banks Sparse-SIFT Visual Words Our Approach (Mid-Level Discriminative Patches) Parts, Segments

slide-6
SLIDE 6

Discriminative Patches

Two key requirements

  • 1. Representative : Need to occur frequently enough.
  • 2. Discriminative: Need to be different enough from

the rest of the visual world.

slide-7
SLIDE 7

First some examples

slide-8
SLIDE 8

Unsupervised Discovery of Discriminative Patches

Given “discovery dataset” Find a relatively small number of discriminative patches that represent it well. We assume access to a “natural world” dataset, which captures the visual statistics of the world in general. Dataset: Subset of Pascal VOC 2007 with six categories.

slide-9
SLIDE 9

Visual Word Approach

  • Sample a lot of patches from the discovery dataset

(represented in terms of their features*) at various locations and scales.

  • Perform some form of unsupervised clustering (e.g. K-

Means) Doesn’t work well. * We use Histogram of Oriented Gradients (HOG) features

slide-10
SLIDE 10

K-Means Clusters

slide-11
SLIDE 11

Chicken-Egg Problem

  • If we know that a set of patches are visually similar,

we can easily learn a distance metric for them

  • If we know the distance metric, then we can easily

find other members.

slide-12
SLIDE 12

Discriminative Clustering

  • Initialize using K-Means
  • Train a discriminative classifier to represent the

distance function (treating other clusters as negative examples).

  • Re-assign the patches to clusters whose classifier

gives highest score

  • Repeat
slide-13
SLIDE 13

Discriminative Clustering*

  • Initialize using K-Means
  • Train a discriminative classifier to represent the

distance function (Using “natural world” as negative data).

  • Detect the patches and assign to clusters.
  • Repeat
slide-14
SLIDE 14

Discriminative Clustering*

Initial Final Initial Final

slide-15
SLIDE 15

Discriminative Clustering+

  • Split the discovery dataset into two equal parts

{Training, Validation}

  • Perform the training step of Discriminative

Clustering* on Training set.

  • Perform the detection step of Discriminative

Clustering* on Validation set.

  • Exchange the roles of Training and Validation sets.
  • Repeat.
slide-16
SLIDE 16

Discriminative Clustering+

KMeans Iter 4 Iter 3 Iter 2 Iter 1

slide-17
SLIDE 17

Discriminative Clustering+

KMeans Iter 4 Iter 3 Iter 2 Iter 1

slide-18
SLIDE 18

More Results

slide-19
SLIDE 19

Image in terms of D+ Patches

slide-20
SLIDE 20

Ranking Patches

  • Purity: Homogeneity of the clusters. Approximated

by the mean SVM score for top few members

  • Discriminativeness: How rare are the patches in the

“natural world”. Approximated by term frequency in “discovery dataset” with respect to both combined.

slide-21
SLIDE 21

Top Ranked Patches

slide-22
SLIDE 22

Doublets : Spatially Consistent Pairs

slide-23
SLIDE 23

Doublets : Refinement

slide-24
SLIDE 24

Discovered Doublets

slide-25
SLIDE 25

Discovered Doublets

slide-26
SLIDE 26

Evaluation

  • Comparison with Visual Words
  • Dictionary of 1000 visual words to compare against

1000 Discriminative clusters.

slide-27
SLIDE 27

Evaluation : Purity

200 400 600 800 1000 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Number of Clusters Cluster Purity

Purity

Visual Word Our Approach

slide-28
SLIDE 28

Evaluation : Coverage

200 400 600 800 1000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Number of Clusters Dataset Coverage

Coverage

Visual Word Our Approach

slide-29
SLIDE 29

Supervised Image Classification

Bus Horse Train Sofa Dining Table Motor Bike Average Vis- Word 0.45 0.70 0.60 0.59 0.41 0.51 0.54 D-Pats 0.60 0.82 0.61 0.67 0.55 0.67 0.65 D-Pats + Doublets 0.62 0.82 0.61 0.67 0.57 0.68 0.66

slide-30
SLIDE 30

Going Further : More Supervision

  • Discovering using category labels.
  • Per-category Clustering.
slide-31
SLIDE 31

Using Labels

Table 1: horse

0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

AP: 0.356 AP at 0.1 Recall: 0.098

Recall Precision 0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

AP: 0.340 AP at 0.1 Recall: 0.094

Recall Precision

slide-32
SLIDE 32

Using Labels

0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

AP: 0.270 AP at 0.1 Recall: 0.088

Recall Precision 0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

AP: 0.240 AP at 0.1 Recall: 0.084

Recall Precision

slide-33
SLIDE 33

Per-Category Clustering

  • Discovery Dataset: Images belonging to a single category
slide-34
SLIDE 34

Top Patches Per-Scene

Bookstore Cloister Buffet Bowling

slide-35
SLIDE 35

Top Patches Per-Scene

Computer Room Laundromat Shoe Shop Waiting Room

slide-36
SLIDE 36

Thank You

Fun Fact: Only ~300,000 CPU Hours consumed

slide-37
SLIDE 37
  • Histogram of gradient orientations
  • Orientation -Position
  • Weighted by magnitude

*Borrowed From Alyosha’s Slides

slide-38
SLIDE 38

Average Precision

0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Precision

*Formulas from Wikipedia

slide-39
SLIDE 39

Spatial Pyramid

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

level 2 level 1

level 0

1/4 1/4 1/2

+ + +