1/29/2009 1
Sliding window detection
January 29, 2009 Kristen Grauman UT-Austin
Schedule
- http://www.cs.utexas.edu/~grauman/cours
/ i 2009/ h d l ht es/spring2009/schedule.htm
- http://www.cs.utexas.edu/~grauman/cours
Sliding window detection January 29, 2009 Kristen Grauman - - PDF document
1/29/2009 Sliding window detection January 29, 2009 Kristen Grauman UT-Austin Schedule http://www.cs.utexas.edu/~grauman/cours es/spring2009/schedule.htm / i 2009/ h d l ht http://www.cs.utexas.edu/~grauman/cours
Slidi i d d i – Sliding window detection – Contrast-based representations – Face and pedestrian detection via sliding window classification
– Viola-Jones detection algorithm
Irving Biederman, Recognition-by-Components: A Theory of Human Image
Alan L. Yuille, David S. Cohen, Peter W. Hallinan. Feature extraction from faces using deformable templates,1989.
– E.g., a list of pixel intensities
Learning patterns directly from image features
Eigenfaces (Turk & Pentland, 1991)
– E.g., a list of pixel intensities
Learning patterns directly from image features
Eigenfaces (Turk & Pentland, 1991)
when good training examples available
Scene recognition based on global texture pattern. [Oliva & Torralba (2001)]
Car/non-car Classifier Yes, car. No, not a car.
If object may be in a cluttered scene, slide a window around looking for it. Car/non-car Classifier
Fleshing out this pipeline a bit more, we need to: Training examples 1. Obtain training data 2. Define features 3. Define classifier Car/non-car Classifier Feature extraction
When do we have a correct detection?
detection? Is this correct? Area intersection Area intersection Area union > 0.5
Slide credit: Antonio Torralba
Summarize results with an ROC curve: show how the number of correctly classified positive examples varies relative to the number of incorrectly
y classified negative examples.
Feature extraction
grayscale / color histogram vector of pixel intensities
Generate low-
An early appearance-based approach to face recognition
Training images Mean Eigenvectors computed from covariance matrix
Project new images Generate low- dimensional representation of appearance with a linear subspace.
Turk & Pentland, 1991
j g to “face space”. Recognition via nearest neighbors in face space
Mean
...
sensitive to illumination and intra-class appearance pp variation
Cartoon example: an albino koala
– Locally orderless: offers invariance to small shifts and rotations – Contrast-normalization: try to correct for variable illumination
Dalal & Triggs, CVPR 2005
Map each grid cell in the input window to a histogram counting the gradients per orientation.
Lowe, ICCV 1999
Local patch descriptor Rotate according to dominant gradient direction
Convolve with Gabor filters at multiple
Serre, Wolf, Poggio, CVPR 2005 Mutch & Lowe, CVPR 2006
Pool nearby units (max) Intermediate layers compare input to prototype patches
Compute differences between sums of pixels in rectangles Captures contrast in adjacent spatial regions Similar to Haar wavelets, efficient to compute
Viola & Jones, CVPR 2001
Count the number of points inside each bin, e.g.: Count = 4 Count = 10 ... Log-polar binning: more precision for nearby points, more flexibility for farther points.
Belongie, Malik & Puzicha, ICCV 2001
Local descriptor
ng
gnition Tutorial gnition Tutorial
Image feature
Perceptual and Sens
Visual Object Recog Visual Object Recog
g
ng
0.1
) , Pr( car image ) , Pr( car image ¬
Generative: separately model class-conditional
gnition Tutorial gnition Tutorial
10 20 30 40 50 60 70 0.05 1
x = data
) | Pr( image car ) | Pr( image car ¬
image feature
model class-conditional and prior densities Discriminative: directly model posterior
Perceptual and Sens
Visual Object Recog Visual Object Recog
10 20 30 40 50 60 70 0.5
x = data Plots from Antonio Torralba 2007
image feature
p
ng
+ possibly interpretable + can draw samples
gnition Tutorial gnition Tutorial
+ appealing when infeasible to model data itself + excel in practice
Perceptual and Sens
Visual Object Recog Visual Object Recog
e cel p act ce
31
ng
Nearest neighbor Neural networks
gnition Tutorial gnition Tutorial
106 examples
Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik 2005... LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 … Support Vector Machines Conditional Random Fields Boosting
Perceptual and Sens
Visual Object Recog Visual Object Recog
McCallum, Freitag, Pereira 2000; Kumar, Hebert 2003 … Guyon, Vapnik Heisele, Serre, Poggio, 2001,…
S lide adapted from Antonio Torralba
Viola, Jones 2001, Torralba et al. 2004, Opelt et al. 2006,…
ng
classifiers”, which need only be better than chance
gnition Tutorial gnition Tutorial
weak classifier
including fast simple classifiers that alone may be inaccurate
Perceptual and Sens
Visual Object Recog Visual Object Recog
Easy to implement Base learning algorithm for Viola-Jones face detector
33
ng
Consider a 2-d feature space with positive and i l
gnition Tutorial gnition Tutorial
negative examples. Each weak classifier splits the training examples with at least 50% accuracy. Examples misclassified by i k l
Perceptual and Sens
Visual Object Recog Visual Object Recog
34
Figure adapted from Freund and S chapire
a previous weak learner are given more emphasis at future rounds.
ng
gnition Tutorial gnition Tutorial
Perceptual and Sens
Visual Object Recog Visual Object Recog
35
ng
gnition Tutorial gnition Tutorial
Perceptual and Sens
Visual Object Recog Visual Object Recog
36
Final classifier is combination of the weak classifiers
ng
Start with uniform weights on training examples {x1,…xn}
gnition Tutorial gnition Tutorial
Evaluate weighted error for each feature, pick best. Re-weight the examples: For T rounds
Perceptual and Sens
Visual Object Recog Visual Object Recog
incorrectly classified ⇒ more weight Correctly classified ⇒ less weight Final classifier is combination of the weak ones, weighted according to the error they had.
[Freund & Schapire 1995] ng
global appearance models + a sliding window detection approach fit well:
gnition Tutorial gnition Tutorial
detection approach fit well:
Regular 2D structure Center of face almost shaped like a “patch”/window
Perceptual and Sens
Visual Object Recog Visual Object Recog
38
ng
Feature output is difference between adjacent regions “Rectangular” filters
gnition Tutorial gnition Tutorial
Efficiently computable with integral image: any sum can be computed
Value at (x,y) is sum of pixels above and to the left of (x,y) Perceptual and Sens
Visual Object Recog Visual Object Recog
39
Viola & Jones, CVPR 2001
in constant time Avoid scaling images scale features directly for same cost
Integral image
ng
Considering all possible filter parameters:
gnition Tutorial gnition Tutorial
p position, scale, and type: 180,000+ possible features associated with each 24 x 24 window
Perceptual and Sens
Visual Object Recog Visual Object Recog
window Use AdaBoost both to select the informative features and to form the classifier
Viola & Jones, CVPR 2001
ng
Evaluate each rectangle filter on each example
gnition Tutorial gnition Tutorial
Evaluate each rectangle filter on each example Sort examples by filter values Select best threshold for each filter (min error)
– Sorted list can be quickly scanned for the optimal threshold
Select best filter/threshold combination Weight on this features is a simple function of error rate Reweight examples
Perceptual and Sens
Visual Object Recog Visual Object Recog
g p
(first version appeared at CVPR 2001)
ng
that best separates positive (faces) and negative (non- faces) training examples, in terms of weighted error.
gnition Tutorial gnition Tutorial
Resulting weak classifier:
Perceptual and Sens
Visual Object Recog Visual Object Recog
Outputs of a possible rectangle feature on faces and non-faces.
… For next round, reweight the examples according to errors, choose another filter/threshold combo.
Viola & Jones, CVPR 2001
ng
For efficiency, apply less accurate but faster classifiers first to immediately discard windows that clearly appear to be negative; e.g.,
gnition Tutorial gnition Tutorial
appear to be negative; e.g.,
classifier
false negative rates early in the chain
Perceptual and Sens
Visual Object Recog Visual Object Recog
43
Fleuret & Geman, IJCV 2001 Rowley et al., P AMI 1998 Viola & Jones, CVPR 2001
Figure from Viola & Jones CVPR 2001
vsfalse negdetermined by
% False Pos 50
vsfalse neg determined by
% Detection 50 100
Viola 2003
FACE
IMAGE SUB-WINDOW
Classifier 1 F T NON-FACE Classifier 3 T F NON-FACE F T NON-FACE Classifier 2 T F NON-FACE
Slide credit: Paul Viola
1 Feature 5 Features F 50% 20 Features 20% 2%
FACE
F F
IMAGE SUB-WINDOW
F NON-FACE F NON-FACE F NON-FACE
Viola 2003
– using data from previous stage.
Slide credit: Paul Viola ng
Train cascade of classifiers with Ad B t
gnition Tutorial gnition Tutorial
Faces Non-faces
AdaBoost
Selected features, thresholds, and weights New image
Perceptual and Sens
Visual Object Recog Visual Object Recog
http:/ / www.intel.com/ technology/ computing/ opencv/ ]
46
ng
First two features
gnition Tutorial gnition Tutorial
First two features selected
Perceptual and Sens
Visual Object Recog Visual Object Recog
47
ng
gnition Tutorial gnition Tutorial
Perceptual and Sens
Visual Object Recog Visual Object Recog
ng
gnition Tutorial gnition Tutorial
Perceptual and Sens
Visual Object Recog Visual Object Recog
ng
gnition Tutorial gnition Tutorial
Perceptual and Sens
Visual Object Recog Visual Object Recog
ng
Detecting profile faces requires training separate detector with profile examples.
gnition Tutorial gnition Tutorial
Perceptual and Sens
Visual Object Recog Visual Object Recog
ng
gnition Tutorial gnition Tutorial
Perceptual and Sens
Visual Object Recog Visual Object Recog
Paul Viola, ICCV tutorial
Postprocess: suppress non- maxima
ng
Frontal faces detected and
gnition Tutorial gnition Tutorial
detected and then tracked, character names inferred with alignment
subtitles.
Perceptual and Sens
Visual Object Recog Visual Object Recog
53
Everingham, M., Sivic, J. and Zisserman, A. "Hello! My name is... Buffy" - Automatic naming of characters in TV video, BMVC 2006.
http:/ / www.robots.ox.ac.uk/ ~vgg/ research/ nface/ index.html
ng
Nearest neighbor Neural networks
gnition Tutorial gnition Tutorial
106 examples
Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik 2005... LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 … Support Vector Machines Conditional Random Fields Boosting
Perceptual and Sens
Visual Object Recog Visual Object Recog
McCallum, Freitag, Pereira 2000; Kumar, Hebert 2003 … Guyon, Vapnik Heisele, Serre, Poggio, 2001,…
S lide adapted from Antonio Torralba
Viola, Jones 2001, Torralba et al. 2004, Opelt et al. 2006,…
: negative : positive < + ⋅ ≥ + ⋅ b b
i i i i
w x x w x x
1 : 1) ( positive ≥ + ⋅ = b y w x x 1 : 1) ( negative 1 : 1) ( positive − ≤ + ⋅ − = ≥ + = b y b y
i i i i i i
w x x w x x
For support, vectors,
1 ± = + ⋅ b
i w
x
Margin Support vectors
and Knowledge Discovery, 1998
1 : 1) ( positive ≥ + ⋅ = b y w x x 1 : 1) ( negative 1 : 1) ( positive − ≤ + ⋅ − = ≥ + = b y b y
i i i i i i
w x x w x x
For support, vectors,
1 ± = + ⋅ b
i w
x
Distance between point and line:
|| || | | w w x b
i
+ ⋅
Margin M Support vectors
|| || w w w 2 1 1 = − − = M
w w x w 1 ± = + b
Τ
For support vectors:
1 : 1) ( positive ≥ + ⋅ = b y w x x 1 : 1) ( negative 1 : 1) ( positive − ≤ + ⋅ − = ≥ + = b y b y
i i i i i i
w x x w x x
For support, vectors,
1 ± = + ⋅ b
i w
x
Distance between point and line:
|| || | | w w x b
i
+ ⋅
Margin Support vectors
|| ||
Therefore, the margin is 2 / ||w||
i i i i i i
i i i i y x
Support vector learned weight
i i i i y x
N ti th t it li i d t b t th t t
i i i i
i
i i
If f(x) < 0, classify as negative, if f(x) > 0, classify as positive
point x and the support vectors xi
computing the inner products xi · xj between all pairs of training points)
Datasets that are linearly separable with some noise
work out great:
x
But what are we going to do if the dataset is just too hard? How about… mapping data to a higher-dimensional
space:
x x2 x
Slide from Andrew Moore’s tutorial: http://www.autonlab.org/tutorials/svm.html
General idea: the original input space can be mapped
to some higher-dimensional feature space where the training set is separable:
Φ: x → φ(x)
Slide from Andrew Moore’s tutorial: http://www.autonlab.org/tutorials/svm.html
j) = φ(xi ) · φ(xj)
i i i i
and Knowledge Discovery, 1998
Linear: K(xi,xj)= xi Txj Polynomial of power p: K(xi,xj)= (1+ xi Txj)p Gaussian (radial-basis function network):
2 j i
2
j i x
Slide from Andrew Moore’s tutorial: http://www.autonlab.org/tutorials/svm.html
example. 2 Select a kernel function
between labeled examples
support vectors & weights. pp g
kernel values between new input and support vectors, apply weights, check sign of output.
SVM with Haar wavelets [Papageorgiou & Poggio, IJCV 2000] SVM with HoGs [Dalal & Triggs, CVPR 2005]
CVPR 2005
U ti d t d t t d t i – Use motion and appearance to detect pedestrians – Generalize rectangle features for sequence data – Training examples = pairs of images.
Appearance, P. Viola, M. Jones, and D. Snow, ICCV 2003.
Dynamic Dynamic detector Static detector Dynamic detector detector Static Static detector
ng
Some classes well-captured by 2d appearance pattern Simple detection protocol to implement
gnition Tutorial gnition Tutorial
Good feature choices critical Past successes for certain classes
Perceptual and Sens
Visual Object Recog Visual Object Recog
75
ng
For example: 250,000 locations x 30 orientations x 4 scales =
30,000,000 evaluations!
gnition Tutorial gnition Tutorial
With so many windows, false positive rate better be low If training binary detectors independently, means cost increases
linearly with number of classes
Perceptual and Sens
Visual Object Recog Visual Object Recog
76
ng
gnition Tutorial gnition Tutorial
Perceptual and Sens
Visual Object Recog Visual Object Recog
77
ng
representations assuming a fixed 2d structure; or must assume fixed viewpoint
gnition Tutorial gnition Tutorial
assume fixed viewpoint
with holistic appearance-based descriptions
Perceptual and Sens
Visual Object Recog Visual Object Recog
78
ng
gnition Tutorial gnition Tutorial
Sliding window Detector’s view
Perceptual and Sens
Visual Object Recog Visual Object Recog
79
Figure credit: Derek Hoiem
ng
(expensive)
gnition Tutorial gnition Tutorial
can lead to sensitivity to partial occlusions
Perceptual and Sens
Visual Object Recog Visual Object Recog
80
Image credit: Adam, Rivlin, & S himshoni
Download
Matlab code
Dataset with cars and computer monitors
http://people.csail.mit.edu/torralba/iccv2005/ From : Antonio Torralba