SLIDE 1 COS 429: COMPUTER VISON
Face Recognition
- Intro to recognition
- PCA and Eigenfaces
- LDA and Fisherfaces
- Face detection: Viola & Jones
- (Optional) generic object models
for faces: the Constellation Model
Reading: Turk & Pentland, ???
SLIDE 3 Face Recognition
- Digital photography
- Surveillance
SLIDE 4 Face Recognition
- Digital photography
- Surveillance
- Album organization
SLIDE 5 Face Recognition
- Digital photography
- Surveillance
- Album organization
- Person tracking/id.
SLIDE 6 Face Recognition
- Digital photography
- Surveillance
- Album organization
- Person tracking/id.
- Emotions and
expressions
SLIDE 7 Face Recognition
- Digital photography
- Surveillance
- Album organization
- Person tracking/id.
- Emotions and
expressions
- Security/warfare
- Tele-conferencing
- Etc.
SLIDE 8
vs.
Identification or Discrimination
What’s ‘recognition’?
SLIDE 9
What’s ‘recognition’?
vs.
Categorization or Classification Identification or Discrimination
SLIDE 10
What’s ‘recognition’?
Categorization or Classification Identification or Discrimination No localization
Yes, there are faces
SLIDE 11
What’s ‘recognition’?
Categorization or Classification Identification or Discrimination No localization
Yes, there is John Lennon
SLIDE 12 What’s ‘recognition’?
Categorization or Classification Identification or Discrimination No localization Detection or Localizatoin
John Lennon
SLIDE 13
What’s ‘recognition’?
Categorization or Classification Identification or Discrimination No localization Detection or Localizatoin
SLIDE 14
What’s ‘recognition’?
Categorization or Classification Identification or Discrimination No localization Detection or Localizatoin
SLIDE 15 Today’s agenda
Categorization or Classification Identification or Discrimination No localization Detection or Localizatoin
- 1. PCA & Eigenfaces
- 2. LDA & Fisherfaces
- 3. AdaBoost
- 4. Constellation model
SLIDE 16
SLIDE 17 Eigenfaces and Fishfaces
– Principle Component Analysis (PCA) – Linear Discriminant Analysis (LDA)
SLIDE 18 The Space of Faces
- An image is a point in a high dimensional space
– An N x M image is a point in RNM – We can define vectors in this space as we did in the 2D case
+ =
[Thanks to Chuck Dyer, Steve Seitz, Nishino]
SLIDE 19 Key Idea
} ˆ {
P RL
x = χ
- Images in the possible set are highly correlated.
- So, compress them to a low-dimensional subspace that
captures key appearance characteristics of the visual DOFs.
- EIGENFACES: [Turk and Pentland]
USE PCA!
SLIDE 20 Two simple but useful techniques
For example, a generative graphical model: P(identity,image) = P(identiy|image) P(image)
Preprocessing model (can be performed by PCA)
SLIDE 21 Principal Component Analysis (PCA) Principal Component Analysis (PCA)
- PCA is used to determine the most representing
features among data points.
– It computes the p-dimensional subspace such that the projection of the data points onto the subspace has the largest variance among all p-dimensional subspaces.
SLIDE 22 Illustration of PCA Illustration of PCA
One projection PCA projection
x1 x2 1 2 3 4 5 6 x1 x2 1 2 3 4 5 6 X1’
SLIDE 23 Illustration of PCA Illustration of PCA
x1 x2
1st principal component 2rd principal component
SLIDE 24 Eigenface Eigenface for Face Recognition for Face Recognition
- PCA has been used for face image
representation/compression, face recognition and many others.
- Compare two faces by projecting the images
into the subspace and measuring the EUCLIDEAN distance between them.
SLIDE 25 Mathematical Formulation
Find a transformation, W,
m-dimensional n-dimensional Orthonormal
Total scatter matrix:
Wopt corresponds to m eigen- vectors of ST
SLIDE 26 Eigenfaces
- PCA extracts the eigenvectors of A
– Gives a set of vectors v1, v2, v3, ... – Each one of these vectors is a direction in face space
SLIDE 27 Projecting onto the Eigenfaces
- The eigenfaces v1, ..., vK span the space of faces
– A face is converted to eigenface coordinates by
SLIDE 28 Algorithm Algorithm
- 1. Align training images x1, x2, …, xN
- 2. Compute average face u = 1/N Σ xi
- 3. Compute the difference image φi = xi – u
Training Training
Note that each image is formulated into a long vector!
SLIDE 29 Algorithm Algorithm
Testing
- 1. Projection in Eigenface
Projection ωi = W (X – u), W = {eigenfaces}
ST = 1/NΣ φi φiT = BBT, B=[φ1, φ2 … φN]
- 4. Compute the covariance matrix (total scatter matrix)
- 5. Compute the eigenvectors of the covariance
matrix , W
SLIDE 30
Illustration of Illustration of Eigenfaces Eigenfaces
These are the first 4 eigenvectors from a training set of 400 images (ORL Face Database). They look like faces, hence called Eigenface.
The visualization of eigenvectors:
SLIDE 31
Eigenfaces look somewhat like generic faces.
SLIDE 32
Eigenvalues Eigenvalues
SLIDE 33 Only selecting the top P eigenfaces reduces the dimensionality. Fewer eigenfaces result in more information loss, and hence less discrimination between faces.
Reconstruction and Errors Reconstruction and Errors
P = 4 P = 200 P = 400
SLIDE 34 Summary for PCA and Summary for PCA and Eigenface Eigenface
- Non-iterative, globally optimal solution
- PCA projection is optimal for reconstruction
from a low dimensional basis, but may NOT be
- ptimal for discrimination…
SLIDE 35 Linear Discriminant Analysis (LDA) Linear Discriminant Analysis (LDA)
- Using Linear Discriminant Analysis (LDA) or
Fisher’s Linear Discriminant (FLD)
- Eigenfaces attempt to maximise the scatter of the
training images in face space, while Fisherfaces attempt to maximise the between class scatter, while minimising the within class scatter.
SLIDE 36 Illustration of the Projection Illustration of the Projection
Poor Projection Good Projection
x1 x2 x1 x2
Using two classes as example:
SLIDE 37
Comparing with PCA Comparing with PCA
SLIDE 38 Variables
- N Sample images:
- c classes:
- Average of each class:
- Total average:
{ }
N
x x , ,
1 L
{ }
c
χ χ , ,
1 L
∑ =
∈
i k
x k i i
x N
χ
μ 1 ∑ =
= N k k
x N
1
1 μ
SLIDE 39 Scatters
( )( )T
i k x i k i
x x S
i k
μ μ
χ
− ∑ − =
∈
∑ =
= c i i W
S S
1
( )( )
∑ − − =
= c i T i i i B
S
1
μ μ μ μ χ
B W T
S S S + =
- Within class scatter:
- Between class scatter:
- Total scatter:
SLIDE 40 Illustration
2
S
1
S
B
S
2 1
S S SW + =
x1 x2
SLIDE 41 Mathematical Formulation (1)
After projection: Between class scatter (of y’s): Within class scatter (of y’s):
k T k
x W y = W S W S
B T B =
~ W S W S
W T W =
~
SLIDE 42 Mathematical Formulation (2)
W S W W S W S S W
W T B T W B
W W
max arg ~ ~ max arg = = m i w S w S
i W i i B
, , 1 K = = λ
- How is it found ? Generalized Eigenvectors
Data dimension is much larger than the number of samples The matrix is singular:
N n >>
( )
c N S Rank
W
− ≤
W
S
SLIDE 43 Fisherface (PCA+FLD)
- Project with FLD to space
- Project with PCA to space
1 − c
k T fld k
z W y = W W S W W W W S W W W
pca W T pca T pca B T pca T fld W
max arg = c N −
k T pca k
x W z =
W S W W
T T pca W
max arg =
SLIDE 44 Illustration of FisherFace
SLIDE 45 Results: Eigenface vs. Fisherface (1)
- Variation in Facial Expression, Eyewear, and Lighting
- Input:
160 images of 16 people
159 images
1 image
With glasses Without glasses 3 Lighting conditions 5 expressions
SLIDE 46
Eigenface vs. Fisherface (2)
SLIDE 47 discussion
- Removing the first three principal
components results in better performance under variable lighting conditions
- The Firsherface methods had error rates
lower than the Eigenface method for the small datasets tested.
SLIDE 48 Today’s agenda
Categorization or Classification Identification or Discrimination No localization Detection or Localizatoin
- 1. PCA & Eigenfaces
- 2. LDA & Fisherfaces
- 3. AdaBoost
- 4. Constellation model
SLIDE 49 Robust Face Detection Using AdaBoost
- Brief intro on (Ada)Boosting
- Viola & Jones, 2001
– Weak detectors: Haar wavelets – Integral image – Cascade – Exp. & Res.
Reference:
- P. Viola and M. Jones (2001) Robust Real-time Object Detection, IJCV.
SLIDE 50 Discriminative methods
Object detection and recognition is formulated as a classification problem.
Bag of image patches
Decision boundary
… and a decision is taken at each window about if it contains a target object or not.
Computer screen Background
In some feature space
Where are the screens?
The image is partitioned into a set of overlapping windows
SLIDE 51 A simple object detector with Boosting
Download
- Toolbox for manipulating dataset
- Code and dataset
Matlab code
- Gentle boosting
- Object detector using a part based model
Dataset with cars and computer monitors
http://people.csail.mit.edu/torralba/iccv2005/
SLIDE 52
- A simple algorithm for learning robust classifiers
– Freund & Shapire, 1995 – Friedman, Hastie, Tibshhirani, 1998
- Provides efficient algorithm for sparse visual
feature selection
– Tieu & Viola, 2000 – Viola & Jones, 2003
- Easy to implement, not requires external
- ptimization tools.
Why boosting?
SLIDE 53
- Defines a classifier using an additive model:
Boosting
Strong classifier Weak classifier Weight Features vector
SLIDE 54
- Defines a classifier using an additive model:
- We need to define a family of weak classifiers
Boosting
Strong classifier Weak classifier Weight Features vector from a family of weak classifiers
SLIDE 55 Each data point has a class label: wt =1 and a weight: +1 ( )
yt =
Boosting
- It is a sequential procedure:
xt=1 xt=2 xt
SLIDE 56 Toy example
Weak learners from the family of lines h => p(error) = 0.5 it is at chance Each data point has a class label: wt =1 and a weight: +1 ( )
yt =
SLIDE 57 Toy example
This one seems to be the best Each data point has a class label: wt =1 and a weight: +1 ( )
yt = This is a ‘weak classifier’: It performs slightly better than chance.
SLIDE 58 Toy example
We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: wt wt exp{-yt Ht} We update the weights: +1 ( )
yt =
SLIDE 59 Toy example
We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: wt wt exp{-yt Ht} We update the weights: +1 ( )
yt =
SLIDE 60 Toy example
We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: wt wt exp{-yt Ht} We update the weights: +1 ( )
yt =
SLIDE 61 Toy example
We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: wt wt exp{-yt Ht} We update the weights: +1 ( )
yt =
SLIDE 62 Toy example
The strong (non- linear) classifier is built as the combination of all the weak (linear) classifiers.
f1 f2 f3 f4
SLIDE 63 Real-time Face Detection
– New image representation to compute the features very quickly
– Selecting a small number of important feature
– A method for combining classifiers – Focussing attention on promising regions of the image
- Implemented on 700MHz Intel Pentium Ⅲ, face
detection proceeds at 15f/s.
– Working only with a single grey scale image
SLIDE 64 Features
- Three kinds of rectangle features
- The sum of the pixels which lie within the
white rectangles are subtracted from the sum of pixels in the gray rectangles
SLIDE 65 Integral Image
The sum within D=4-(2+3)+1
,
( , ) ( , )
x x y y
ii x y i x y
′ ′ ≤ ≤
′ ′ = ∑
SLIDE 66 Learning Classification Function (1)
…..
1
( ,1) x
2
( ,1) x
3
( ,0) x
4
( ,0) x ( , )
n n
x y
① ② Initialize weights
1,
1 1 , for 1,0 respectively 2 2
i i
w y l m = =
# of face # of nonface
- Selecting a small number of important
features
1
α
2
α
5
( ,0) x
6
( ,0) x
SLIDE 67 Learning Classification Function (2)
③ For t=1,….,T
, , , 1 t i t i n t j j
w w w
=
← ∑
- a. Normalize the weights
- b. For each feature, j
( )
j i j i i i w h x
y ε = −
∑
1 if ( ) ( )
j j j
f x h x θ > ⎧ = ⎨ ⎩
- c. Choose the classifier, ht with the lowest error
t
ε
1 ( ) 1, ,
t i i
h x y t i t i t
w w β
− − +
=
1
t t t
ε β ε = −
1 or 0
SLIDE 68 Learning Classification Function (3)
④ The final strong classifier is
1 1
1 1 ( ) ( ) 2
T T t t t t t
h x h x α α
= =
⎧ ≥ ⎪ = ⎨ ⎪ ⎩
∑ ∑
1 log
t t
α β =
☞The final hypothesis is a weighted linear combination of the T hypotheses where the weights are inversely proportional to the training errors
1
α
2
α
SLIDE 69 Requiring 0.7 seconds to scan 384x288 pixel image
Learning Results
- 200 features
- Detection rate: 95% - false positive 1/14,804
The first and second features selected by AdaBoost
SLIDE 70 The Attentional Cascade
- Reject many of the negative sub-windows
Two-feature strong classifier Detect: 100% False positive:40%
SLIDE 71 A Cascaded Detector
1 2 3 T T T F F F f, d Ftarget f, d f, d
SLIDE 72 Detector Cascade Discussion
☞ The speed of the cascaded classifier is almost 10 times faster
SLIDE 73 Experimental Results (1)
– Face training set: 4916 hand labeled faces – Scaled and aligned to a base resolution of 24ⅹ24
- Structure of the detector cascade
– 32layer, 4297 feature
- Training time for the entire 32 layer detector
– on the order of weeks – a single 466 MHz AlphaStation XP900
1 3 2 T T T F F F 2 20 50 5 T F 100 20 T F 200 Reject about 60% of non-faces Correctly detecting close to 100% of faces
SLIDE 74 Face Image Databases
- Databases for face recognition can be
best utilized as training sets
– Each image consists of an individual on a uniform and uncluttered background
- Test Sets for face detection
– MIT, CMU (frontal, profile), Kodak
SLIDE 75
Training dataset: 4916 images
SLIDE 76 Experimental Results
– MIT+CMU frontal face test set – 130 images with 507 labeled frontal faces
MIT test set: 23 images with 149 faces Sung & poggio: detection rate 79.9% with 5 false positive AdaBoost: detection rate 77.8% with 5 false positives 89.9 90.1
83.2 Neural-net 93.7 91.8 91.1 90.8 90.1 89.8 88.8 85.2 78.3 AdaBoost 422 167 110 95 78 65 50 31 10 False detection
- > not significantly different
accuracy
- > but the cascade class. almost
10 times faster
SLIDE 77
SLIDE 78
SLIDE 79
SLIDE 80
SLIDE 81
SLIDE 82
SLIDE 83
SLIDE 84
SLIDE 85 Today’s agenda
Categorization or Classification Identification or Discrimination No localization Detection or Localizatoin
- 1. PCA & Eigenfaces
- 2. LDA & Fisherfaces
- 3. AdaBoost
- 4. Constellation model
SLIDE 86
- Fischler & Elschlager 1973
- Yuille ‘91
- Brunelli & Poggio ‘93
- Lades, v.d. Malsburg et al. ‘93
- Cootes, Lanitis, Taylor et al. ‘95
- Amit & Geman ‘95, ‘99
- et al. Perona ‘95, ‘96, ’98, ’00, ‘03
- Huttenlocher et al. ’00
- Agarwal & Roth ’02
etc…
Parts and Structure Literature
SLIDE 87
A B D C
Deformations
SLIDE 88
Background clutter
SLIDE 89
Frontal faces
SLIDE 90
Face images
SLIDE 91
3D Object recognition – Multiple mixture components
SLIDE 92 3D Orientation Tuning
Frontal Profile
20 40 60 80 100 50 55 60 65 70 75 80 85 90 95 100
Orientation Tuning
angle in degrees % Correct
% Correct