[PPT] - Machine Learning for Biometrics Dong XU School of Electrical and PowerPoint Presentation

SLIDE 1

Machine Learning for Biometrics

Dong XU School of Electrical and Information Engineering University of Syndey

SLIDE 2

Outline

Dimensionality Reduction for Tensor-based Objects Graph Embedding: A General Framework for Dimensionality Reduction Learning using Privileged Information for Face Verification and Person Re-identification

SLIDE 3

What is Dimensionality Reduction?

PCA LDA Examples: 2D space to 1D space

SLIDE 4

What is Dimensionality Reduction?

Example: 3D space to 2D space ISOMAP: Geodesic Distance Preserving

J. Tenenbaum et al., 2000

SLIDE 5

Why Conduct Dimensionality Reduction?

Pose Variation Expression Variation LPP, 2003

He et al.

Uncover intrinsic structure

 Visualization  Feature Extraction  Computation Efficiency  Broad Applications

Face Recognition Human Gait Recognition CBIR

SLIDE 6

Outline

Dimensionality Reduction for Tensor-based Objects Graph Embedding: A General Framework for Dimensionality Reduction Learning using Privileged Information for Face Verification and Person Re-identification

SLIDE 7

What is Tensor?

Tensors are arrays of numbers which transform in certain ways under coordinate transformations.

1

m

2

m

3

m

1

m

Vector Matrix 3rd-order Tensor

2

m

1

m

1 2 3

m m m  

 X R

1 2

m m

X



 R

1

m

xR

SLIDE 8

2

1 m ij ik kj k

Y X U



 

100 100 100 100 100 100

. . .

100 10 100 10 100 10 100 10 100 10 100 10

= = = . . .

Definition of Mode-k Product

(100)

2

) '(10

m

2(100)

m

2

m

(100)

1

m

(100)

1

m

2

) '(10

m

1

m

2(100)

m

1(100)

m

3(40)

m

2

) '(10

m

2(100)

m

k U

  Y X

Notation:

Product for two Matrices

Original Matrix New Matrix

=

1(100)

m

3(40)

m

2

) '(10

m

Y XU 

Projection Matrix Original Tensor New Tensor Projection Matrix

Projection:

high-dimensional space

> low-dimensional space

Reconstruction:

low-dimensional space

> high-dimensional space

SLIDE 9

Data Representation in Dimensionality Reduction

Vector Matrix 3rd-order Tensor High Dimension Low Dimension Examples

PCA, LDA Rank-1 Decomposition, 2001

A. Shashua

and A. Levin Low rank approximation

f matrix
J. Ye

Tensorface, 2002

M. Vasilescu and
D. Terzopoulos

Our Work Xu et al., 2005 Yan et al., 2005

. . . . . . . . .

Filtered Image Video Sequence Gray-level Image

SLIDE 10

What is Gabor Features?

Gabor features can improve recognition performance in comparison to grayscale features. C Liu and H Wechsler, T-IP, 2002

Gabor Wavelet Kernels

Eight Orientations Five Scales

Input: Grayscale Image Output: 40 Gabor-filtered Images

…

SLIDE 11

Why Represent Objects as Tensors instead of Vectors?

Natural Representation

Gray-level Images (2D structure) Videos (3D structure) Gabor-filtered Images (3D structure)

Enhance Learnability in Real Application

Curse of Dimensionality (Gabor-filtered image: 100*100*40 -> Vector: 400,000) Small sample size problem

Reduce Computation Cost

... ...

SLIDE 12

Concurrent Subspace Analysis as an Example (Criterion: Optimal Reconstruction)

1

m

100 40 100

1

U

2

U

The reconstructed sample Input sample Projection Matrices? Sample in Low- dimensional space

3

U

1

m

10 10 10

Dimensionality Reduction

1

m

100 40 100

Reconstruction

D. Xu, S. Yan, H. Zhang and et al., CVPR, 2005

3 1

* 3 1 2 1 1 1 3 3 3 |

( | ) arg min || ... ||

k k

k k i i i U

U U U U U



     



X X

Objective Function:

SLIDE 13

Connection to Previous Work –Tensorface

(M. Vasilescu and D. Terzopoulos, 2002)

Person Image Vector Illumination Pose Expression Image Object Dim 1 Image Object Dim 2 Image Object Dim 3 Image Object Dim 4

. . . . . . Image object 1 Image object 2

Image Object Dim 1 Image Object Dim 2 Image Object Dim 3 Image Object Dim 4

. . . . . .

(a) Tensorface (b) CSA

From an algorithmic view or mathematics view, CSA and Tensorface are both variants of Rank-(R1,R2,…,Rn) decomposition. Tensorface CSA

Motivation Characterize external factors Characterize internal factors Input: Gray-level Image Vector Matrix Input: Gabor-filtered Image (Video Sequence ) Not address 3rd-order tensor When equal to PCA The number of images per person are only one or are a prime number Never Number of Images per Person for Training Lots of images per person One image per person

SLIDE 14

Experiments: Database Description

Number of Persons (Images per person) Image Size (Pixels) Example Images Simulated Video Sequence 60 (1) 646413 ORL database 40 (10) 5646 CMU PIE-1 sub- database 60 (10) 6464 CMU PIE-2 sub- databases 60 (10) 6464

SLIDE 15

Experiments: Object Reconstruction (1)

Input: Gabor-filtered images ORL database CMU PIE-1 database Objective Evaluation Criterion:

Root Mean Squared Error (RMSE) and Compression Ratio (CR)

ORL database CMU PIE-1 database

SLIDE 16

Experiments: Object Reconstruction (2)

Original Images Reconstructed Images from PCA Reconstructed Images from CSA

Input: Simulated video sequence

SLIDE 17

Experiments: Face Recognition

Input: Gray-level images and Gabor-filtered images ORL database CMU PIE database

Algorithm CMU PIE-1 CMU PIE-2 ORL PCA (Gray-level feature) 70.1% 28.3% 76.9% PCA (Gabor feature) 80.1% 42.0% 86.6% CSA (Ours) 90.5% 59.4 % 94.4%

SLIDE 18

Summary

This is the first work to address dimensionality

reduction with a tensor representation of arbitrary order.

Opens a new research direction.

SLIDE 19

Bilinear and Tensor Subspace Learning (New Research Direction)

Concurrent Subspace Analysis (CSA), CVPR 2005 and T-CSVT 2008
Discriminant Analysis with Tensor Representation (DATER): CVPR 2005

and T-IP 2007

Rank-one Projections with Adaptive Margins (RPAM): CVPR 2006 and T-

SMC-B 2007

Enhancing Tensor Subspace Learning by Element Rearrangement: CVPR

2007 and T-PAMI 2009

Discriminant Locally Linear Embedding with High Order Tensor Data

(DLLE/T): T-SMC-B 2008

Convergent 2D Subspace Learning with Null Space Analysis (NS2DLDA): T-

CSVT 2008

Semi-supervised Bilinear Subspace Learning: T-IP 2009
Applications in Human Gait Recognition

– CSA+DATER: T-CSVT 2006 – Tensor Marginal Fisher Analysis (TMFA): T-IP 2007 Other researchers also published several papers along this direction!!!

SLIDE 20

Human Gait Recognition: Basic Modules

(a) (d) Pattern Matching (b) (c)

Human Detection and Tracking Silhouette Extraction Feature Extraction Gallery Videos Stored in Database Human Detection and Tracking Silhouette Extraction Feature Extraction Probe Video Classification Yes or No (Verification) ID of Top N Candidates (Identification) Pattern Matching

(a) (d): The extracted silhouettes from

ne probe and gallery video;

(b) (c): The gray-level Gait Energy Images (GEI).

SLIDE 21

Human Gait Recognition with Matrix Representation

D. Xu, S. Yan, H. Zhang and et al., T-CSVT, 2006

SLIDE 22

USF HumanID

Experiment (Probe) # of Probe Sets Difference between Gallery and Probe Set A (G, A, L, NB, M/N) 122 View B (G, B, R, NB, M/N) 54 Shoe C (G, B, L, NB, M/N) 54 View and Shoe D (C, A, R, NB, M/N) 121 Surface E (C, B, R, NB, M/N) 60 Surface and Shoe F (C, A, L, NB, M/N) 121 Surface and View G (C, B, L, NB, M/N) 60 Surface, Shoe, and View H (G, A, R, BF, M/N) 120 Briefcase I (G, B, R, BF, M/N) 60 Briefcase and Shoe J (G, A, L, BF, M/N) 120 Briefcase and View K (G, A/B, R, NB, N) 33 Time, Shoe, and Clothing L (C, A/B, R, NB, N) 33 Time, Shoe, Clothing, and Surface

1. Shoe types: A or B; 2. Carrying: with or without a briefcase; 3. Time: May or

November; 4. Surface: grass or concrete; 5. Viewpoint: left or right

SLIDE 23

Human Gait Recognition: Our Contributions

*The DNGR method additionally uses the manually annotated silhouettes, which are not publicly available. Methods Average Rank-1 Results (%)

Our Recent Work (Ours, TIP 2012) 70.07 DNGR (Sarkar’s group, TPAMI 2006) 62.81 Image-to-Class distance (Ours, TCSVT 2010) 61.19 GTDA (Maybank’s group, TPAMI 2007) 60.58 Bilinear Subspace Learning method 2: MMFA (Ours, TIP 2007) 59.9% Bilinear Subspace Learning method 1: CSA + DATER (Ours, TCSVT 2006) 58.5% PCA+LDA (Bhanu’s group, TPAMI 2006) 57.70%

Top ranked results on the benchmark USF HumanID dataset

SLIDE 24

How to Utilize More Correlations?

Pixel Rearrangement

Sets of highly correlated pixels Columns of highly correlated pixels

Pixel Rearrangement

Potential Assumption in Previous Tensor-based Subspace Learning: Intra-tensor correlations: Correlations among the features within certain

tensor dimensions, such as rows, columns and Gabor features…

D. Xu, S. Yan et al., T-PAMI 2009

SLIDE 25

Problem Definition

The task of enhancing correlation/redundancy among

2nd–order tensor is to search for a pixel rearrangement

perator R, such that

* 2 , 1

arg min { min || || }

N R T R T i i R U V i

R X UU X VV



 



1. is the rearranged matrix from sample
2. The column numbers of U and V are predefined

i

X

R i

X

After pixel rearrangement, we can use the rearranged tensors as input for concurrent subspace analysis

SLIDE 26

Solution to Pixel Rearrangement Problem

26

Compute reconstructed matrices

1

, 1 1 1 1

n

R R e c T T i n n n i n n

X U U X V V



   



Optimize operator R

2 , 1

arg min || ||

N R Rec n i i n R i

R X X



 



Optimize U and V n=n+1 Initialize U0, V0

2 , 1

( , ) arg min || ||

n n

N R R T T n n i i U V i

U V X UU X VV



 



2 2 , 1 1 1 1 1 1

: || || || ||

n n n

N N R R R Rec T T i i n i n n i n n i i

Note X X X U U X V V

     

  

 

SLIDE 27

It is an integer programming problem

How to optimize R

* 2 , 1

arg min || ||

N R Rec i i n R i

R X X



 



,

min . 1: 0 1; 2 : 1; 3 : 1

pq pq R p q pq pq pq p q

c R st R R R    

  

2 , 1

| ( ) ( ) |

N Rec pq i i n i

where c X p X q



 



1. Linear programming problem has integer solution.
2. We constrain the rearrangement within local neighborhood for speedup.

p

Original matrix Reconstructed matrix

q pq

c

Sender Receiver

SLIDE 28

Convergence Speed

28

SLIDE 29

Rearrangement Results

29

SLIDE 30

Reconstruction Visualization

30

SLIDE 31

Reconstruction Visualization

31

SLIDE 32

Classification Accuracy

SLIDE 33

Outline

Dimensionality Reduction for Tensor-based Objects Graph Embedding: A General Framework for Dimensionality Reduction Learning using Privileged Information for Face Verification and Person Re-identification

SLIDE 34

Representative Previous Work

ISOMAP: Geodesic Distance Preserving

J. Tenenbaum et al., 2000

LLE: Local Neighborhood Relationship Preserving

S. Roweis & L. Saul, 2000

LE/LPP: Local Similarity Preserving, M. Belkin, P. Niyogi et al., 2001, 2003 PCA LDA

SLIDE 35

Dimensionality Reduction Algorithms

Any common perspective to understand and explain these

dimensionality reduction algorithms? Or any unified formulation that is shared by them?

Any general tool to guide developing new algorithms for

dimensionality reduction?

Statistics-based Geometry-based PCA/KPCA LDA/KDA … ISOMAP LLE LE/LPP …

Matrix Tensor

Hundreds

SLIDE 36

Our Answers

Direct Graph Embedding

1

min

T

T y B y

y Ly



Original PCA & LDA, ISOMAP, LLE, Laplacian Eigenmap

Linearization

PCA, LDA, LPP

w X y

T



Kernelization

KPCA, KDA

) (

i i i

x w    

Tensorization

CSA, DATER

n n i i

w w w y     

2 2 1 1

X

Type Formulation Example

S. Yan, D. Xu et al., CVPR 2005, T-PAMI 2007

SLIDE 37

Direct Graph Embedding

1 2

[ , , ..., ]

N

X x x x 

1 2

[ , , ..., ]

T N

y y y y 

Data in high-dimensional space and low- dimensional space (assumed as 1D space here):

L, B: Laplacian matrix from S, SP;

[ , ]

i ij

G S

x



Intrinsic Graph: Penalty Graph

S, SP: Similarity matrix (graph edge)

[ , ]

P P ij

i

G S

x



,

ii ij j i

L D S D S i



   



Similarity in high dimensional space

SLIDE 38

Direct Graph Embedding -- Continued

1 2

[ , , ..., ]

N

X x x x 

1 2

[ , , ..., ]

T N

y y y y 

* 2 1 1 1 1

arg min || || arg min

i j ij y y

r

y y

r

i j y By y B y

y y y S y L y

   

     

  



* 2 1 1

arg min || ||

i j ij y y

r

i j y By

y y y S

 

  

 



Data in high-dimensional space and low- dimensional space (assumed as 1D space here):

L, B: Laplacian matrix from S, SP;

[ , ]

i ij

G S

x



Criterion to Preserve Graph Similarity: Intrinsic Graph: Penalty Graph

S, SP: Similarity matrix (graph edge) Special case B is Identity matrix (Scale normalization)

[ , ]

P P ij

i

G S

x

 Problem: It cannot handle new test data.

,

ii ij j i

L D S D S i



   



Similarity in high dimensional space

SLIDE 39

Linearization

y X w





* 1 1

arg min

w w

r

w XBX w

w w XL X w

  

   



Linear mapping function

Objective function in Linearization

Intrinsic Graph Penalty Graph

Problem: linear mapping function is not enough to preserve the real nonlinear structure?

SLIDE 40

Kernelization

:

i

x   F f

the original input space to another higher dimensional Hilbert space. Nonlinear mapping:

( , ) ( ) ( ) k x y x y    

( , )

ij i j

K k x x  ( )

i i i

w x    

* 1 1

arg min

K

r

KBK

a KLK

   

 

 

  



Kernel matrix: Constraint:

Objective function in Kernelization

Intrinsic Graph Penalty Graph

) (

i

x 

SLIDE 41

Tensorization

Low dimensional representation is

btained as:

Objective function in Tensorization

1 2

...

n

i i n

y w w w     X

1 1 2 1 2 1

* 2 1 2 1 2 ( ,..., ) 1

( ,..., ) arg m in || ... ... ||

n n n n

i n j n ij f w w i j

w w w w w w w w S

 

       



X X

1 1 2 1 1 2 1 2

2 1 2 1 2 1 2 1 2

( ,..., ) || ... || ( ,..., ) || ... ... ||

n n n n n

N i n ii i P i n j n ij i j

f w w w w w B

r

f w w w w w w w w S

 

           

 

X X X

where

Intrinsic Graph Penalty Graph

SLIDE 42

Common Formulation

Tensorization

1 1 2 1 2 1

* 2 1 2 1 2 ( ,..., ) 1

( ,..., ) arg m in || ... ... ||

n n n n

i n j n ij f w w i j

w w w w w w w w S

 

       



X X

1 1 2 1 1 2 1 2

2 1 2 1 2 1 2 1 2

( ,..., ) || ... || ( ,..., ) || ... ... ||

n n n n n

N i n ii i P i n j n ij i j

f w w w w w B

r

f w w w w w w w w S

 

           

 

X X X

where

Linearization Kernelization Direct Graph Embedding

L, B: Laplacian matrix from S, SP; S, SP: Similarity matrix

Intrinsic graph Penalty graph

* 1 1

arg min

w w

r

w XBX w

w w XL X w

  

   



* 1 1

arg min

K

r

KBK

a KLK

   

 

 

  



* 1 1

arg min

y y

r

y By

y y L y

 

  



SLIDE 43

A General Framework for Dimensionality Reduction

Algorithm S & B Definition Embedding Type PCA/KPCA/CSA L/K/T LDA/KDA/DATER L/K/T ISOMAP D LLE D LE/LPP if ; B=D D/L

1

, ;

N ij

S i j B I



 

1 ,

,

i j i

ij l l l N

S n B I ee 





 

( ) , ;

ij G ij

S D i j B I    

; S M M M M B I

 

   

2

e x p { || || / }

ij i j

S x x t    || ||

i j

x x   

D: Direct Graph Embedding L: Linearization K: Kernelization T: Tensorization

SLIDE 44

General Framework: PCA and LDA

1 ,

;

N ij

W i j B I



 

* 1 w w

w arg min w C w



 



1 1 1

( )( ) ( )

i i N N N i

C x x x x X I ee X

  

    



* 1 1

1 1

( )( ) ( ) ( )( )

i i

c c c

W W w w B N l l W i i i N c c c

N c c T n c B c W

w S w S w arg min rg min w S w S x x x x x x x x

w w a w Cw X I e e X S n NC S

 

       

 



        

 



[1,1,...,1] e





Principal Component Analysis Linear Discriminant Analysis

1 ,

,

i j i

ij l l l N

W n B I ee 





 

SLIDE 45

General Framework: ISOMAP

( ) , ;

ij G ij

W D i j B I    

2

1 1 1 1 1 1 ' ' ' ' 2 ' ' ' 1 1 1 1 ' ' ' ' 2 2 2 ' ' ' ' 2

( ) ( / 2) ( ( ) ( ) / 2) ( ( )) ( ) ( )

G ij ij j j ij N N j ij i j ij i j N N N j i j i ij ij i j i j N N j jj ji jj i N

D HSH I ee S I ee S S S S S S S S 

 

                

          

Geodesic Distance Preserving

G

D

Geodesic Distance Matrix Inner Product Matrix

2

( ) / 2 1/     

G ij ij ij ij

D HSH S D and H N  

Multidimensional Scaling

*

max ( ) 

T G y

y Arg y D y 

SLIDE 46

General Framework: LLE

; W M M M M B I

 

    [( )( )] ( ) 1 ( ) 1 ( ) 1 ( ) 1

ij ij ij ji ij j j ij ji ki kj j j k ij ji ki kj j k j ij ji ki j k ij ji ki j j k

I M I M I M M M M M M M M M M M M M M M M M M

 

                      

            

Local neighborhood relationship preserving

i ij j j

X W X  

*

min     

T y ij ij ij ji ki kj k

y Arg y M y where M w w w w 

2

( ) | |   

 

i ij i j i

y y W y

SLIDE 47

General Framework: Laplacian Eigenmap/ LPP

2

|| || 2

exp{ }, || ||

therwise.

i j

i j t ij

W 



        

x x

B=D

2 1

min ( )







T

i j ij y D y ij

y y W

2

|| || 2

exp{ }, || ||

therwise.



        

x x

i j

i j t ij

W 

1

min

T

T y D y

y Ly



,    

ii ij j

L D W D W

T T

XLX XDX   w w

T

y X  w

Local similarity preserving

Laplacian Eigenmap LPP

SLIDE 48

New Dimensionality Reduction Algorithm: Marginal Fisher Analysis

ij

S 

Important Information for face recognition:

1) Label information 2) Local manifold structure (neighborhood or margin)

1: if xi is among the k1-nearest neighbors of xj in the same class; 0 : otherwise 1: if the pair (i,j) is among the k2 shortest pairs among the data set; 0: otherwise

P ij

S 

SLIDE 49

Marginal Fisher Analysis: Advantage

No Gaussian distribution assumption

SLIDE 50

Experiments: Face Recognition

PIE-1 G3/P7 G4/P6 PCA+LDA (Linearization) 65.8% 80.2% PCA+MFA (Ours) 71.0% 84.9% KDA (Kernelization) 70.0% 81.0% KMFA (Ours) 72.3% 85.2% DATER-2 (Tensorization) 80.0% 82.3% TMFA-2 (Ours) 82.1% 85.2% ORL G3/P7 G4/P6 PCA+LDA (Linearization) 87.9% 88.3% PCA+MFA (Ours) 89.3% 91.3% KDA (Kernelization) 87.5% 91.7% KMFA (Ours) 88.6% 93.8% DATER-2 (Tensorization) 89.3% 92.0% TMFA-2 (Ours) 95.0% 96.3%

SLIDE 51

Summary

Optimization framework that unifies previous

dimensionality reduction algorithms as special cases.

A new dimensionality reduction algorithm:

Marginal Fisher Analysis.

SLIDE 52

Outline

Dimensionality Reduction for Tensor-based Objects Graph Embedding: A General Framework for Dimensionality Reduction Learning using Privileged Information for Face Verification and Person Re-identification

SLIDE 53

V. Vapnik and A. Vashist, A new learning paradigm: Learning using privileged information.

Neural Networks, 544–557, 2009.

Privileged information: Information available only in the training process (not available in the testing process). Training: attending classes in the classroom Testing: taking an exam Privileged Information: teacher's instruction

Learning using Privileged Information

racle function

SVM+ (Primal Form) primal form of SVM : main feature : privileged information

SLIDE 54

Applications in Image and Video Recognition

Testing Data

Azimut 95 Luxury Yacht at the Miami International Boat Show 2012 Azimut- Benetti Yachts sees 20 per cent gain in new luxury yacht sales

caption, tags, keywords, .... Training Data

1) Web images/videos are usually associated with additional surrounding textual descriptions (tags, captions, etc.) 2) RGB-D images from Kinect cameras contain additional depth images

SLIDE 55

Distance Metric Learning using Privileged Information

Distance Metric Learning for Face Verification

– Mahalanobis distance:

Distance Metric Learning using Privileged Information

– Additional depth information is used as privileged information

Similar pair Dissimilar pair Similar or dissimilar? Similar or dissimilar? Similar pair Dissimilar pair Similar or dissimilar? Similar or dissimilar?

SLIDE 56

Information Theoretic Distance Metric Learning (ITML)

Distance Metric Learning

– Given training dataset , where is the feature vector and their labels: : two samples belong to the same subject : two samples are from two different subjects

Information Theoretic Distance Metric Learning

where

SLIDE 57

Information Theoretic Distance Metric Learning using Privileged Information (ITML+)

Objective Function

– We learn and simultaneously: where – Discussions:

For similar pairs, the learned distance should be more similar
For dissimilar pairs, the learned distance should be more

dissimilar

X. Xu, W. Li and D. Xu, T-NNLS 2015

SLIDE 58

Experiments: Face Verification

Setting:

– Main features (i.e. gradient-LBP features) from RGB images – Privileged information (i.e. gradient-LBP features) from depth images

Results (AP% and AUC%) on the EUROCOM face dataset

SLIDE 59

Experiments: Person Re-identification

Setting:

– Main features (i.e. kernel descriptor features) from RGB images – Privileged information (i.e. kernel descriptor features) from depth images

Results (mean Rank-1 recognition rates%) on the BIWI RGBD-ID dataset

SLIDE 60

Summary

A new distance metric learning method to

take advantage of additional depth images as privileged information

SLIDE 61