Face Recognition Intro to recognition PCA and Eigenfaces LDA - - PowerPoint PPT Presentation

face recognition
SMART_READER_LITE
LIVE PREVIEW

Face Recognition Intro to recognition PCA and Eigenfaces LDA - - PowerPoint PPT Presentation

COS 429: COMPUTER VISON Face Recognition Intro to recognition PCA and Eigenfaces LDA and Fisherfaces Face detection: Viola & Jones (Optional) generic object models for faces: the Constellation Model Reading: Turk


slide-1
SLIDE 1

COS 429: COMPUTER VISON

Face Recognition

  • Intro to recognition
  • PCA and Eigenfaces
  • LDA and Fisherfaces
  • Face detection: Viola & Jones
  • (Optional) generic object models

for faces: the Constellation Model

Reading: Turk & Pentland, ???

slide-2
SLIDE 2

Face Recognition

  • Digital photography
slide-3
SLIDE 3

Face Recognition

  • Digital photography
  • Surveillance
slide-4
SLIDE 4

Face Recognition

  • Digital photography
  • Surveillance
  • Album organization
slide-5
SLIDE 5

Face Recognition

  • Digital photography
  • Surveillance
  • Album organization
  • Person tracking/id.
slide-6
SLIDE 6

Face Recognition

  • Digital photography
  • Surveillance
  • Album organization
  • Person tracking/id.
  • Emotions and

expressions

slide-7
SLIDE 7

Face Recognition

  • Digital photography
  • Surveillance
  • Album organization
  • Person tracking/id.
  • Emotions and

expressions

  • Security/warfare
  • Tele-conferencing
  • Etc.
slide-8
SLIDE 8

vs.

Identification or Discrimination

What’s ‘recognition’?

slide-9
SLIDE 9

What’s ‘recognition’?

vs.

Categorization or Classification Identification or Discrimination

slide-10
SLIDE 10

What’s ‘recognition’?

Categorization or Classification Identification or Discrimination No localization

Yes, there are faces

slide-11
SLIDE 11

What’s ‘recognition’?

Categorization or Classification Identification or Discrimination No localization

Yes, there is John Lennon

slide-12
SLIDE 12

What’s ‘recognition’?

Categorization or Classification Identification or Discrimination No localization Detection or Localizatoin

John Lennon

slide-13
SLIDE 13

What’s ‘recognition’?

Categorization or Classification Identification or Discrimination No localization Detection or Localizatoin

slide-14
SLIDE 14

What’s ‘recognition’?

Categorization or Classification Identification or Discrimination No localization Detection or Localizatoin

slide-15
SLIDE 15

Today’s agenda

Categorization or Classification Identification or Discrimination No localization Detection or Localizatoin

  • 1. PCA & Eigenfaces
  • 2. LDA & Fisherfaces
  • 3. AdaBoost
  • 4. Constellation model
slide-16
SLIDE 16
slide-17
SLIDE 17

Eigenfaces and Fishfaces

  • Introduction
  • Techniques

– Principle Component Analysis (PCA) – Linear Discriminant Analysis (LDA)

  • Experiments
slide-18
SLIDE 18

The Space of Faces

  • An image is a point in a high dimensional space

– An N x M image is a point in RNM – We can define vectors in this space as we did in the 2D case

+ =

[Thanks to Chuck Dyer, Steve Seitz, Nishino]

slide-19
SLIDE 19

Key Idea

} ˆ {

P RL

x = χ

  • Images in the possible set are highly correlated.
  • So, compress them to a low-dimensional subspace that

captures key appearance characteristics of the visual DOFs.

  • EIGENFACES: [Turk and Pentland]

USE PCA!

slide-20
SLIDE 20

Two simple but useful techniques

For example, a generative graphical model: P(identity,image) = P(identiy|image) P(image)

Preprocessing model (can be performed by PCA)

slide-21
SLIDE 21

Principal Component Analysis (PCA) Principal Component Analysis (PCA)

  • PCA is used to determine the most representing

features among data points.

– It computes the p-dimensional subspace such that the projection of the data points onto the subspace has the largest variance among all p-dimensional subspaces.

slide-22
SLIDE 22

Illustration of PCA Illustration of PCA

One projection PCA projection

x1 x2 1 2 3 4 5 6 x1 x2 1 2 3 4 5 6 X1’

slide-23
SLIDE 23

Illustration of PCA Illustration of PCA

x1 x2

1st principal component 2rd principal component

slide-24
SLIDE 24

Eigenface Eigenface for Face Recognition for Face Recognition

  • PCA has been used for face image

representation/compression, face recognition and many others.

  • Compare two faces by projecting the images

into the subspace and measuring the EUCLIDEAN distance between them.

slide-25
SLIDE 25

Mathematical Formulation

Find a transformation, W,

m-dimensional n-dimensional Orthonormal

Total scatter matrix:

Wopt corresponds to m eigen- vectors of ST

slide-26
SLIDE 26

Eigenfaces

  • PCA extracts the eigenvectors of A

– Gives a set of vectors v1, v2, v3, ... – Each one of these vectors is a direction in face space

  • what do these look like?
slide-27
SLIDE 27

Projecting onto the Eigenfaces

  • The eigenfaces v1, ..., vK span the space of faces

– A face is converted to eigenface coordinates by

slide-28
SLIDE 28

Algorithm Algorithm

  • 1. Align training images x1, x2, …, xN
  • 2. Compute average face u = 1/N Σ xi
  • 3. Compute the difference image φi = xi – u

Training Training

Note that each image is formulated into a long vector!

slide-29
SLIDE 29

Algorithm Algorithm

Testing

  • 1. Projection in Eigenface

Projection ωi = W (X – u), W = {eigenfaces}

  • 2. Compare projections

ST = 1/NΣ φi φiT = BBT, B=[φ1, φ2 … φN]

  • 4. Compute the covariance matrix (total scatter matrix)
  • 5. Compute the eigenvectors of the covariance

matrix , W

slide-30
SLIDE 30

Illustration of Illustration of Eigenfaces Eigenfaces

These are the first 4 eigenvectors from a training set of 400 images (ORL Face Database). They look like faces, hence called Eigenface.

The visualization of eigenvectors:

slide-31
SLIDE 31

Eigenfaces look somewhat like generic faces.

slide-32
SLIDE 32

Eigenvalues Eigenvalues

slide-33
SLIDE 33

Only selecting the top P eigenfaces reduces the dimensionality. Fewer eigenfaces result in more information loss, and hence less discrimination between faces.

Reconstruction and Errors Reconstruction and Errors

P = 4 P = 200 P = 400

slide-34
SLIDE 34

Summary for PCA and Summary for PCA and Eigenface Eigenface

  • Non-iterative, globally optimal solution
  • PCA projection is optimal for reconstruction

from a low dimensional basis, but may NOT be

  • ptimal for discrimination…
slide-35
SLIDE 35

Linear Discriminant Analysis (LDA) Linear Discriminant Analysis (LDA)

  • Using Linear Discriminant Analysis (LDA) or

Fisher’s Linear Discriminant (FLD)

  • Eigenfaces attempt to maximise the scatter of the

training images in face space, while Fisherfaces attempt to maximise the between class scatter, while minimising the within class scatter.

slide-36
SLIDE 36

Illustration of the Projection Illustration of the Projection

Poor Projection Good Projection

x1 x2 x1 x2

Using two classes as example:

slide-37
SLIDE 37

Comparing with PCA Comparing with PCA

slide-38
SLIDE 38

Variables

  • N Sample images:
  • c classes:
  • Average of each class:
  • Total average:

{ }

N

x x , ,

1 L

{ }

c

χ χ , ,

1 L

∑ =

i k

x k i i

x N

χ

μ 1 ∑ =

= N k k

x N

1

1 μ

slide-39
SLIDE 39

Scatters

  • Scatter of class i:

( )( )T

i k x i k i

x x S

i k

μ μ

χ

− ∑ − =

∑ =

= c i i W

S S

1

( )( )

∑ − − =

= c i T i i i B

S

1

μ μ μ μ χ

B W T

S S S + =

  • Within class scatter:
  • Between class scatter:
  • Total scatter:
slide-40
SLIDE 40

Illustration

2

S

1

S

B

S

2 1

S S SW + =

x1 x2

slide-41
SLIDE 41

Mathematical Formulation (1)

After projection: Between class scatter (of y’s): Within class scatter (of y’s):

k T k

x W y = W S W S

B T B =

~ W S W S

W T W =

~

slide-42
SLIDE 42

Mathematical Formulation (2)

  • The desired projection:

W S W W S W S S W

W T B T W B

  • pt

W W

max arg ~ ~ max arg = = m i w S w S

i W i i B

, , 1 K = = λ

  • How is it found ? Generalized Eigenvectors

Data dimension is much larger than the number of samples The matrix is singular:

N n >>

( )

c N S Rank

W

− ≤

W

S

slide-43
SLIDE 43

Fisherface (PCA+FLD)

  • Project with FLD to space
  • Project with PCA to space

1 − c

k T fld k

z W y = W W S W W W W S W W W

pca W T pca T pca B T pca T fld W

max arg = c N −

k T pca k

x W z =

W S W W

T T pca W

max arg =

slide-44
SLIDE 44

Illustration of FisherFace

  • Fisherface
slide-45
SLIDE 45

Results: Eigenface vs. Fisherface (1)

  • Variation in Facial Expression, Eyewear, and Lighting
  • Input:

160 images of 16 people

  • Train:

159 images

  • Test:

1 image

With glasses Without glasses 3 Lighting conditions 5 expressions

slide-46
SLIDE 46

Eigenface vs. Fisherface (2)

slide-47
SLIDE 47

discussion

  • Removing the first three principal

components results in better performance under variable lighting conditions

  • The Firsherface methods had error rates

lower than the Eigenface method for the small datasets tested.

slide-48
SLIDE 48

Today’s agenda

Categorization or Classification Identification or Discrimination No localization Detection or Localizatoin

  • 1. PCA & Eigenfaces
  • 2. LDA & Fisherfaces
  • 3. AdaBoost
  • 4. Constellation model
slide-49
SLIDE 49

Robust Face Detection Using AdaBoost

  • Brief intro on (Ada)Boosting
  • Viola & Jones, 2001

– Weak detectors: Haar wavelets – Integral image – Cascade – Exp. & Res.

Reference:

  • P. Viola and M. Jones (2001) Robust Real-time Object Detection, IJCV.
slide-50
SLIDE 50

Discriminative methods

Object detection and recognition is formulated as a classification problem.

Bag of image patches

Decision boundary

… and a decision is taken at each window about if it contains a target object or not.

Computer screen Background

In some feature space

Where are the screens?

The image is partitioned into a set of overlapping windows

slide-51
SLIDE 51

A simple object detector with Boosting

Download

  • Toolbox for manipulating dataset
  • Code and dataset

Matlab code

  • Gentle boosting
  • Object detector using a part based model

Dataset with cars and computer monitors

http://people.csail.mit.edu/torralba/iccv2005/

slide-52
SLIDE 52
  • A simple algorithm for learning robust classifiers

– Freund & Shapire, 1995 – Friedman, Hastie, Tibshhirani, 1998

  • Provides efficient algorithm for sparse visual

feature selection

– Tieu & Viola, 2000 – Viola & Jones, 2003

  • Easy to implement, not requires external
  • ptimization tools.

Why boosting?

slide-53
SLIDE 53
  • Defines a classifier using an additive model:

Boosting

Strong classifier Weak classifier Weight Features vector

slide-54
SLIDE 54
  • Defines a classifier using an additive model:
  • We need to define a family of weak classifiers

Boosting

Strong classifier Weak classifier Weight Features vector from a family of weak classifiers

slide-55
SLIDE 55

Each data point has a class label: wt =1 and a weight: +1 ( )

  • 1 ( )

yt =

Boosting

  • It is a sequential procedure:

xt=1 xt=2 xt

slide-56
SLIDE 56

Toy example

Weak learners from the family of lines h => p(error) = 0.5 it is at chance Each data point has a class label: wt =1 and a weight: +1 ( )

  • 1 ( )

yt =

slide-57
SLIDE 57

Toy example

This one seems to be the best Each data point has a class label: wt =1 and a weight: +1 ( )

  • 1 ( )

yt = This is a ‘weak classifier’: It performs slightly better than chance.

slide-58
SLIDE 58

Toy example

We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: wt wt exp{-yt Ht} We update the weights: +1 ( )

  • 1 ( )

yt =

slide-59
SLIDE 59

Toy example

We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: wt wt exp{-yt Ht} We update the weights: +1 ( )

  • 1 ( )

yt =

slide-60
SLIDE 60

Toy example

We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: wt wt exp{-yt Ht} We update the weights: +1 ( )

  • 1 ( )

yt =

slide-61
SLIDE 61

Toy example

We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: wt wt exp{-yt Ht} We update the weights: +1 ( )

  • 1 ( )

yt =

slide-62
SLIDE 62

Toy example

The strong (non- linear) classifier is built as the combination of all the weak (linear) classifiers.

f1 f2 f3 f4

slide-63
SLIDE 63

Real-time Face Detection

  • Integral Image

– New image representation to compute the features very quickly

  • AdaBoost

– Selecting a small number of important feature

  • Cascade

– A method for combining classifiers – Focussing attention on promising regions of the image

  • Implemented on 700MHz Intel Pentium Ⅲ, face

detection proceeds at 15f/s.

– Working only with a single grey scale image

slide-64
SLIDE 64

Features

  • Three kinds of rectangle features
  • The sum of the pixels which lie within the

white rectangles are subtracted from the sum of pixels in the gray rectangles

slide-65
SLIDE 65

Integral Image

The sum within D=4-(2+3)+1

,

( , ) ( , )

x x y y

ii x y i x y

′ ′ ≤ ≤

′ ′ = ∑

slide-66
SLIDE 66

Learning Classification Function (1)

…..

1

( ,1) x

2

( ,1) x

3

( ,0) x

4

( ,0) x ( , )

n n

x y

① ② Initialize weights

1,

1 1 , for 1,0 respectively 2 2

i i

w y l m = =

# of face # of nonface

  • Selecting a small number of important

features

1

α

2

α

5

( ,0) x

6

( ,0) x

slide-67
SLIDE 67

Learning Classification Function (2)

③ For t=1,….,T

, , , 1 t i t i n t j j

w w w

=

← ∑

  • a. Normalize the weights
  • b. For each feature, j

( )

j i j i i i w h x

y ε = −

1 if ( ) ( )

  • therwise

j j j

f x h x θ > ⎧ = ⎨ ⎩

  • c. Choose the classifier, ht with the lowest error

t

ε

  • d. Update the weights

1 ( ) 1, ,

t i i

h x y t i t i t

w w β

− − +

=

1

t t t

ε β ε = −

1 or 0

slide-68
SLIDE 68

Learning Classification Function (3)

④ The final strong classifier is

1 1

1 1 ( ) ( ) 2

  • therwise

T T t t t t t

h x h x α α

= =

⎧ ≥ ⎪ = ⎨ ⎪ ⎩

∑ ∑

1 log

t t

α β =

☞The final hypothesis is a weighted linear combination of the T hypotheses where the weights are inversely proportional to the training errors

1

α

2

α

slide-69
SLIDE 69

Requiring 0.7 seconds to scan 384x288 pixel image

Learning Results

  • 200 features
  • Detection rate: 95% - false positive 1/14,804

The first and second features selected by AdaBoost

slide-70
SLIDE 70

The Attentional Cascade

  • Reject many of the negative sub-windows

Two-feature strong classifier Detect: 100% False positive:40%

slide-71
SLIDE 71

A Cascaded Detector

1 2 3 T T T F F F f, d Ftarget f, d f, d

slide-72
SLIDE 72

Detector Cascade Discussion

☞ The speed of the cascaded classifier is almost 10 times faster

slide-73
SLIDE 73

Experimental Results (1)

  • Training dataset

– Face training set: 4916 hand labeled faces – Scaled and aligned to a base resolution of 24ⅹ24

  • Structure of the detector cascade

– 32layer, 4297 feature

  • Training time for the entire 32 layer detector

– on the order of weeks – a single 466 MHz AlphaStation XP900

1 3 2 T T T F F F 2 20 50 5 T F 100 20 T F 200 Reject about 60% of non-faces Correctly detecting close to 100% of faces

slide-74
SLIDE 74

Face Image Databases

  • Databases for face recognition can be

best utilized as training sets

– Each image consists of an individual on a uniform and uncluttered background

  • Test Sets for face detection

– MIT, CMU (frontal, profile), Kodak

slide-75
SLIDE 75

Training dataset: 4916 images

slide-76
SLIDE 76

Experimental Results

  • Test dataset

– MIT+CMU frontal face test set – 130 images with 507 labeled frontal faces

MIT test set: 23 images with 149 faces Sung & poggio: detection rate 79.9% with 5 false positive AdaBoost: detection rate 77.8% with 5 false positives 89.9 90.1

  • 89.2
  • 86.0

83.2 Neural-net 93.7 91.8 91.1 90.8 90.1 89.8 88.8 85.2 78.3 AdaBoost 422 167 110 95 78 65 50 31 10 False detection

  • > not significantly different

accuracy

  • > but the cascade class. almost

10 times faster

slide-77
SLIDE 77
slide-78
SLIDE 78
slide-79
SLIDE 79
slide-80
SLIDE 80
slide-81
SLIDE 81
slide-82
SLIDE 82
slide-83
SLIDE 83
slide-84
SLIDE 84
slide-85
SLIDE 85

Today’s agenda

Categorization or Classification Identification or Discrimination No localization Detection or Localizatoin

  • 1. PCA & Eigenfaces
  • 2. LDA & Fisherfaces
  • 3. AdaBoost
  • 4. Constellation model
slide-86
SLIDE 86
  • Fischler & Elschlager 1973
  • Yuille ‘91
  • Brunelli & Poggio ‘93
  • Lades, v.d. Malsburg et al. ‘93
  • Cootes, Lanitis, Taylor et al. ‘95
  • Amit & Geman ‘95, ‘99
  • et al. Perona ‘95, ‘96, ’98, ’00, ‘03
  • Huttenlocher et al. ’00
  • Agarwal & Roth ’02

etc…

Parts and Structure Literature

slide-87
SLIDE 87

A B D C

Deformations

slide-88
SLIDE 88

Background clutter

slide-89
SLIDE 89

Frontal faces

slide-90
SLIDE 90

Face images

slide-91
SLIDE 91

3D Object recognition – Multiple mixture components

slide-92
SLIDE 92

3D Orientation Tuning

Frontal Profile

20 40 60 80 100 50 55 60 65 70 75 80 85 90 95 100

Orientation Tuning

angle in degrees % Correct

% Correct