Unsupervised Discovery Of Mid-level Discriminative Patches
Saurabh Singh (ss1@andrew.cmu.edu), RI
Unsupervised Discovery Of Mid-level Discriminative Patches Saurabh - - PowerPoint PPT Presentation
Unsupervised Discovery Of Mid-level Discriminative Patches Saurabh Singh (ss1@andrew.cmu.edu), RI Which representation seems intuitive? Spectrum of Visual Features Low-Level High-Level Pixel Filter-Banks Sparse-SIFT Parts, Objects
Saurabh Singh (ss1@andrew.cmu.edu), RI
Low-Level High-Level Pixel Image Objects Filter-Banks Sparse-SIFT Visual Words Parts, Segments
Low-Level High-Level Pixel Image Objects Filter-Banks Sparse-SIFT Visual Words Our Approach (Mid-Level Discriminative Patches) Parts, Segments
Two key requirements
the rest of the visual world.
Given “discovery dataset” Find a relatively small number of discriminative patches that represent it well. We assume access to a “natural world” dataset, which captures the visual statistics of the world in general. Dataset: Subset of Pascal VOC 2007 with six categories.
(represented in terms of their features*) at various locations and scales.
Means) Doesn’t work well. * We use Histogram of Oriented Gradients (HOG) features
we can easily learn a distance metric for them
find other members.
distance function (treating other clusters as negative examples).
gives highest score
distance function (Using “natural world” as negative data).
Initial Final Initial Final
{Training, Validation}
Clustering* on Training set.
Clustering* on Validation set.
KMeans Iter 4 Iter 3 Iter 2 Iter 1
KMeans Iter 4 Iter 3 Iter 2 Iter 1
by the mean SVM score for top few members
“natural world”. Approximated by term frequency in “discovery dataset” with respect to both combined.
1000 Discriminative clusters.
200 400 600 800 1000 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Number of Clusters Cluster Purity
Purity
Visual Word Our Approach
200 400 600 800 1000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Number of Clusters Dataset Coverage
Coverage
Visual Word Our Approach
Bus Horse Train Sofa Dining Table Motor Bike Average Vis- Word 0.45 0.70 0.60 0.59 0.41 0.51 0.54 D-Pats 0.60 0.82 0.61 0.67 0.55 0.67 0.65 D-Pats + Doublets 0.62 0.82 0.61 0.67 0.57 0.68 0.66
Table 1: horse
0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
AP: 0.356 AP at 0.1 Recall: 0.098
Recall Precision 0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
AP: 0.340 AP at 0.1 Recall: 0.094
Recall Precision
0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
AP: 0.270 AP at 0.1 Recall: 0.088
Recall Precision 0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
AP: 0.240 AP at 0.1 Recall: 0.084
Recall Precision
Bookstore Cloister Buffet Bowling
Computer Room Laundromat Shoe Shop Waiting Room
Fun Fact: Only ~300,000 CPU Hours consumed
*Borrowed From Alyosha’s Slides
0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Precision
*Formulas from Wikipedia
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
level 2 level 1
level 0
1/4 1/4 1/2
+ + +