11/17/2015 1
Recognition continued: discriminative classifiers
Tues Nov 17 Kristen Grauman UT Austin
Announcements
- A5 out today, due Dec 2
Recognition continued: discriminative classifiers Tues Nov 17 - - PDF document
11/17/2015 Recognition continued: discriminative classifiers Tues Nov 17 Kristen Grauman UT Austin Announcements A5 out today, due Dec 2 1 11/17/2015 Previously Supervised classification Window-based generic object detection
11/17/2015 1
Tues Nov 17 Kristen Grauman UT Austin
11/17/2015 2
– basic pipeline – boosting classifiers – face detection as case study
rectangular filter responses at multiple scales, vs. extract typical convolution filter responses at multiple scales?
what properties do we require the early vs. later classifiers (stages) in the cascade to have?
11/17/2015 3
Considering all possible filter parameters: position, scale, and type: 180,000+ possible features associated with each 24 x 24 window
Which subset of these features should we use to determine if a window has a face? Use AdaBoost both to select the informative features and to form the classifier
11/17/2015 4
that best separates positive (faces) and negative (non- faces) training examples, in terms of weighted error.
Outputs of a possible rectangle feature on faces and non-faces.
… Resulting weak classifier: For next round, reweight the examples according to errors, choose another filter/threshold combo.
discard windows that clearly appear to be negative
11/17/2015 5
Training the cascade
each stage
its target rates have been met
enough, then add another stage
negative training examples for the next stage
Train with 5K positives, 350M negatives Real-time detector using 38 layer cascade 6061 features in all layers
[Implementation available in OpenCV]
Faces Non-faces
Train cascade of classifiers with AdaBoost
Selected features, thresholds, and weights New image
11/17/2015 6
face windows
CVPR 2001.
Boosting: pros and cons
examples
(SVMs, CNNs,…)
– especially for many-class problems
Slide credit: Lana Lazebnik
11/17/2015 7
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Tutorial
Viola-Jones Face Detector: Results
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Tutorial
Window-based detection: strengths
descriptors:
11/17/2015 8
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Tutorial
Window-based detection: Limitations
30,000,000 evaluations!
linearly with number of classes
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Tutorial
Limitations (continued)
11/17/2015 9
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Tutorial
Limitations (continued)
representations assuming a fixed 2d structure; or must assume fixed viewpoint
with holistic appearance-based descriptions
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Tutorial
Limitations (continued)
Figur e cr edit: Der ek Hoiem
Sliding window Detector’s view
11/17/2015 10
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Tutorial
Limitations (continued)
(expensive)
can lead to sensitivity to partial occlusions
Image credit: Adam, Rivlin, & Shimshoni
When will sliding window face detection work best? Class photos
11/17/2015 11
What other categories are amenable to window- based representation?
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Tutorial
Pedestrian detection
window’s appearance/texture; e.g.,
SVM with Haar wavelets [Papag eorgiou & Pog gio, IJCV 2000] Space-time rectangle features [Viola, Jones & Snow, ICCV 2003] SVM with HoGs [Dalal & Trig gs, CVPR 2005]
11/17/2015 12
– Model/representation/classifier choice – Sliding window and classifier scoring
– Exemplar of basic paradigm – Plus key ideas: rectangular features, Adaboost for feature selection, cascade
Main idea:
regions/boxes that have object-like properties.
exhaustive sliding windows
Alexe et al. Measuring the objectness of image windows, PAMI 2012
11/17/2015 13
Alexe et al. Measuring the objectness of image windows, PAMI 2012
Multi-scale saliency Color contrast
Alexe et al. Measuring the objectness of image windows, PAMI 2012
Edge density Superpipxel straddling
11/17/2015 14
More proposals Alexe et al. Measuring the objectness of image windows, PAMI 2012
segmentation using constrained parametric min-cuts. PAMI, 2012.
11/17/2015 15
using constrained parametric min-cuts. PAMI, 2012.
diverse ranking. In PAMI, 2014.
Binarized normed gradients for objectness estimation at 300fps. In CVPR, 2014
from edges. In ECCV, 2014.
for object recognition. IJCV, 2013.
Multiscale combinatorial grouping. In CVPR, 2014.
11/17/2015 16
SVM + person detection
e.g., Dalal & Triggs
Boosting + face detection
Viola & Jones
NN + scene Gist classification
e.g., Hays & Efros
test data point
Voronoi partitioning of feature space for 2-category 2D data
from Duda et al.
Black = negative Red = positive Novel test example Closest to a positive example from the training set, so classify it as positive.
11/17/2015 17
k = 5
Source: D. Lowe
If query lands here, the 5 NN consist of 3 negatives and 2 positives, so we classify it as negative. Black = negative Red = positive
11/17/2015 18
[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.]
11/17/2015 19
Annotated by Flickr users
11/17/2015 20
6+ million geotagged photos by 109,788 photographers
Annotated by Flickr users
11/17/2015 21 A scene is a single surface that can be represented by global (statistical) descriptors
Spatial Envelope Theory of Scene Representation
Oliva & Torralba (2001)
Slide Credit: Aude Olivia
Oliva & Torralba IJCV 2001, Torralba et al. CVPR 2003
Capture global image properties while keeping some spatial information
Gist descriptor
11/17/2015 22
[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.]
11/17/2015 23
[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.]
11/17/2015 24
[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.]
[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.]
11/17/2015 25
[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.]
11/17/2015 26
[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.]
– Simple to implement – Flexible to feature / distance choices – Naturally handles multi-class cases – Can do well in practice with enough representative data
– Large search problem to find nearest neighbors – Storage of data – Must know we have a meaningful distance function
Kristen Grauman
11/17/2015 27
Where was this picture taken?
Where was this picture taken?
11/17/2015 28
Where was this picture taken?
Where was each picture in this sequence taken?
11/17/2015 29
tourist photos
novel “test” photos
[Chen & Grauman CVPR 2011]
[Chen & Grauman CVPR 2011]
11/17/2015 30
State 1 State 2 State 3 P(S2|S1) P(S1|S2) P(S1|S1) P(S2|S2) P(S3|S2) P(S2|S3) P(S3|S3) P(S1|S3) P(S3|S1)
P(Observation | State ) P(State )
Observation Observation Observation
Define states with data-driven approach:
New York
mean shift clustering on the GPS coordinates of the training images
11/17/2015 31
Location 1 Location 2 Location 3 P(L2|L1) P(L1|L2) P(L1|L1) P(L2|L2) P(L3|L2) P(S2|S3) P(L3|L3) P(L1|L3) P(L3|L1)
P(Observation | State) = P( | Liberty Island)
11/17/2015 32
11/17/2015 33
Routes from travel guide book for New York vs. Random walks in learned HMM
11/17/2015 34
SVM + person detection
e.g., Dalal & Triggs
Boosting + face detection
Viola & Jones
NN + scene Gist classification
e.g., Hays & Efros
11/17/2015 35
Linear classifiers Linear classifiers
negative examples
: negative : positive b b
i i i i
w x x w x x Which line is best?
11/17/2015 36
classifier based on
line (for 2d case)
between the positive and negative training examples
Support vector machines
1 : 1) ( negative 1 : 1) ( positive b y b y
i i i i i i
w x x w x x
Margin Support vectors
Tutorial on Support V ector Machines for Pattern Recognition, Data Mining and Knowledge Discovery , 1998
For support, vectors,
1 b
i w
x
11/17/2015 37
Support vector machines
1 : 1) ( negative 1 : 1) ( positive b y b y
i i i i i i
w x x w x x
Margin M Support vectors For support, vectors,
1 b
i w
x
Distance between point and line:
|| || | | w w x b
i
w w w 2 1 1 M
w w x w 1 b
Τ
For support vectors:
Support vector machines
1 : 1) ( negative 1 : 1) ( positive b y b y
i i i i i i
w x x w x x
Support vectors For support, vectors,
1 b
i w
x
Distance between point and line:
|| || | | w w x b
i
Therefore, the margin is 2 / ||w|| Margin M
11/17/2015 38
Finding the maximum margin line
Quadratic optimization problem: Minimize Subject to yi(w·xi+b) ≥ 1
w wT 2 1
1 : 1) ( negative 1 : 1) ( positive b y b y
i i i i i i
w x x w x x
Finding the maximum margin line
i i i i y x
w
Support vector learned weight
11/17/2015 39
Finding the maximum margin line
b = yi – w·xi (for any support vector)
i i i i y x
w
b y b
i i i i
x x x w
b y x f
i i
x x x w
i i
sign b) ( sign ) (
If f(x) < 0, classify as negative, if f(x) > 0, classify as positive
11/17/2015 40
Dalal & Triggs, CVPR 2005
Dalal & Triggs, CVPR 2005
input window to a histogram counting the gradients per
training set of pedestrian vs. non-pedestrian windows.
11/17/2015 41
International Conference on Computer Vision & Pattern Recognition - June 2005
11/17/2015 42
Datasets that are linearly separable with some noise
work out great:
But what are we going to do if the dataset is just too hard? How about… mapping data to a higher-dimensional
space:
x x x x2
General idea: the original input space can be mapped to
some higher-dimensional feature space where the training set is separable:
Φ: x → φ(x)
Slide f rom Andrew Moore’s tutorial: http://www .autonlab.org/tutorials/sv m.html
11/17/2015 43
Nonlinear SVMs
the lifting transformation φ(x), define a kernel function K such that K(xi,xj
j) = φ(xi )· φ(xj)b K y
i i i i
) , ( x x
2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2 Need to show that K(xi,xj)= φ(xi) Tφ(xj): K(xi,xj)=(1 + xiTxj)2, = 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2 = [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2] = φ(xi) Tφ(xj), where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2]
11/17/2015 44
Examples of kernel functions
Linear:
Gaussian RBF: Histogram intersection:
) 2 exp( ) (
2 2
j i j i
x x ,x x K
k j i j i
k x k x x x K )) ( ), ( min( ) , (
j T i j i
x x x x K ) , (
example.
between labeled examples
SVM support vectors & weights.
kernel values between new input and support vectors, apply weights, check sign of output.
Kristen Grauman
11/17/2015 45
categories?
binary classifiers
– Training: learn an SVM for each class vs. the rest – T esting: apply each SVM to test example and assign to it the class of the SVM that returns the highest decision value
– Training: learn an SVM for each pair of classes – T esting: each learned SVM “votes” for a class to assign to the test example
Kristen Grauman
11/17/2015 46
SVMs: Pros and cons
sizes
– During training time, must compute matrix of kernel values for every pair of examples – Learning can take a very long time for large-scale problems
Adapted from Lana Lazebnik
Summary