Machine Learning for Biometrics Dong XU School of Electrical and - - PowerPoint PPT Presentation
Machine Learning for Biometrics Dong XU School of Electrical and - - PowerPoint PPT Presentation
Machine Learning for Biometrics Dong XU School of Electrical and Information Engineering University of Syndey Outline Dimensionality Reduction for Tensor-based Objects Graph Embedding: A General Framework for Dimensionality Reduction
Outline
Dimensionality Reduction for Tensor-based Objects Graph Embedding: A General Framework for Dimensionality Reduction Learning using Privileged Information for Face Verification and Person Re-identification
What is Dimensionality Reduction?
PCA LDA Examples: 2D space to 1D space
What is Dimensionality Reduction?
Example: 3D space to 2D space ISOMAP: Geodesic Distance Preserving
- J. Tenenbaum et al., 2000
Why Conduct Dimensionality Reduction?
Pose Variation Expression Variation LPP, 2003
He et al.
Uncover intrinsic structure
Visualization Feature Extraction Computation Efficiency Broad Applications
Face Recognition Human Gait Recognition CBIR
Outline
Dimensionality Reduction for Tensor-based Objects Graph Embedding: A General Framework for Dimensionality Reduction Learning using Privileged Information for Face Verification and Person Re-identification
What is Tensor?
Tensors are arrays of numbers which transform in certain ways under coordinate transformations.
1
m
2
m
3
m
1
m
Vector Matrix 3rd-order Tensor
2
m
1
m
1 2 3
m m m
X R
1 2
m m
X
R
1
m
xR
2
1 m ij ik kj k
Y X U
100 100 100 100 100 100
. . .
100 10 100 10 100 10 100 10 100 10 100 10
= = = . . .
Definition of Mode-k Product
(100)
2
) '(10
m
2(100)
m
2
m
(100)
1
m
(100)
1
m
2
) '(10
m
1
m
2(100)
m
1(100)
m
3(40)
m
2
) '(10
m
2(100)
m
k U
Y X
Notation:
Product for two Matrices
Original Matrix New Matrix
=
1(100)
m
3(40)
m
2
) '(10
m
Y XU
Projection Matrix Original Tensor New Tensor Projection Matrix
Projection:
high-dimensional space
- > low-dimensional space
Reconstruction:
low-dimensional space
- > high-dimensional space
Data Representation in Dimensionality Reduction
Vector Matrix 3rd-order Tensor High Dimension Low Dimension Examples
PCA, LDA Rank-1 Decomposition, 2001
- A. Shashua
and A. Levin Low rank approximation
- f matrix
- J. Ye
Tensorface, 2002
- M. Vasilescu and
- D. Terzopoulos
Our Work Xu et al., 2005 Yan et al., 2005
. . . . . . . . .
Filtered Image Video Sequence Gray-level Image
What is Gabor Features?
Gabor features can improve recognition performance in comparison to grayscale features. C Liu and H Wechsler, T-IP, 2002
Gabor Wavelet Kernels
Eight Orientations Five Scales
Input: Grayscale Image Output: 40 Gabor-filtered Images
…
Why Represent Objects as Tensors instead of Vectors?
- Natural Representation
Gray-level Images (2D structure) Videos (3D structure) Gabor-filtered Images (3D structure)
- Enhance Learnability in Real Application
Curse of Dimensionality (Gabor-filtered image: 100*100*40 -> Vector: 400,000) Small sample size problem
- Reduce Computation Cost
... ...
Concurrent Subspace Analysis as an Example (Criterion: Optimal Reconstruction)
1
m
100 40 100
1
U
2
U
The reconstructed sample Input sample Projection Matrices? Sample in Low- dimensional space
3
U
1
m
10 10 10
Dimensionality Reduction
1
m
100 40 100
Reconstruction
- D. Xu, S. Yan, H. Zhang and et al., CVPR, 2005
3 1
* 3 1 2 1 1 1 3 3 3 |
( | ) arg min || ... ||
k k
k k i i i U
U U U U U
X X
Objective Function:
Connection to Previous Work –Tensorface
(M. Vasilescu and D. Terzopoulos, 2002)
Person Image Vector Illumination Pose Expression Image Object Dim 1 Image Object Dim 2 Image Object Dim 3 Image Object Dim 4
. . . . . . Image object 1 Image object 2
Image Object Dim 1 Image Object Dim 2 Image Object Dim 3 Image Object Dim 4
. . . . . .
(a) Tensorface (b) CSA
From an algorithmic view or mathematics view, CSA and Tensorface are both variants of Rank-(R1,R2,…,Rn) decomposition. Tensorface CSA
Motivation Characterize external factors Characterize internal factors Input: Gray-level Image Vector Matrix Input: Gabor-filtered Image (Video Sequence ) Not address 3rd-order tensor When equal to PCA The number of images per person are only one or are a prime number Never Number of Images per Person for Training Lots of images per person One image per person
Experiments: Database Description
Number of Persons (Images per person) Image Size (Pixels) Example Images Simulated Video Sequence 60 (1) 646413 ORL database 40 (10) 5646 CMU PIE-1 sub- database 60 (10) 6464 CMU PIE-2 sub- databases 60 (10) 6464
Experiments: Object Reconstruction (1)
Input: Gabor-filtered images ORL database CMU PIE-1 database Objective Evaluation Criterion:
Root Mean Squared Error (RMSE) and Compression Ratio (CR)
ORL database CMU PIE-1 database
Experiments: Object Reconstruction (2)
Original Images Reconstructed Images from PCA Reconstructed Images from CSA
Input: Simulated video sequence
Experiments: Face Recognition
Input: Gray-level images and Gabor-filtered images ORL database CMU PIE database
Algorithm CMU PIE-1 CMU PIE-2 ORL PCA (Gray-level feature) 70.1% 28.3% 76.9% PCA (Gabor feature) 80.1% 42.0% 86.6% CSA (Ours) 90.5% 59.4 % 94.4%
Summary
- This is the first work to address dimensionality
reduction with a tensor representation of arbitrary order.
- Opens a new research direction.
Bilinear and Tensor Subspace Learning (New Research Direction)
- Concurrent Subspace Analysis (CSA), CVPR 2005 and T-CSVT 2008
- Discriminant Analysis with Tensor Representation (DATER): CVPR 2005
and T-IP 2007
- Rank-one Projections with Adaptive Margins (RPAM): CVPR 2006 and T-
SMC-B 2007
- Enhancing Tensor Subspace Learning by Element Rearrangement: CVPR
2007 and T-PAMI 2009
- Discriminant Locally Linear Embedding with High Order Tensor Data
(DLLE/T): T-SMC-B 2008
- Convergent 2D Subspace Learning with Null Space Analysis (NS2DLDA): T-
CSVT 2008
- Semi-supervised Bilinear Subspace Learning: T-IP 2009
- Applications in Human Gait Recognition
– CSA+DATER: T-CSVT 2006 – Tensor Marginal Fisher Analysis (TMFA): T-IP 2007 Other researchers also published several papers along this direction!!!
Human Gait Recognition: Basic Modules
(a) (d) Pattern Matching (b) (c)
Human Detection and Tracking Silhouette Extraction Feature Extraction Gallery Videos Stored in Database Human Detection and Tracking Silhouette Extraction Feature Extraction Probe Video Classification Yes or No (Verification) ID of Top N Candidates (Identification) Pattern Matching
(a) (d): The extracted silhouettes from
- ne probe and gallery video;
(b) (c): The gray-level Gait Energy Images (GEI).
Human Gait Recognition with Matrix Representation
- D. Xu, S. Yan, H. Zhang and et al., T-CSVT, 2006
USF HumanID
Experiment (Probe) # of Probe Sets Difference between Gallery and Probe Set A (G, A, L, NB, M/N) 122 View B (G, B, R, NB, M/N) 54 Shoe C (G, B, L, NB, M/N) 54 View and Shoe D (C, A, R, NB, M/N) 121 Surface E (C, B, R, NB, M/N) 60 Surface and Shoe F (C, A, L, NB, M/N) 121 Surface and View G (C, B, L, NB, M/N) 60 Surface, Shoe, and View H (G, A, R, BF, M/N) 120 Briefcase I (G, B, R, BF, M/N) 60 Briefcase and Shoe J (G, A, L, BF, M/N) 120 Briefcase and View K (G, A/B, R, NB, N) 33 Time, Shoe, and Clothing L (C, A/B, R, NB, N) 33 Time, Shoe, Clothing, and Surface
- 1. Shoe types: A or B; 2. Carrying: with or without a briefcase; 3. Time: May or
November; 4. Surface: grass or concrete; 5. Viewpoint: left or right
Human Gait Recognition: Our Contributions
*The DNGR method additionally uses the manually annotated silhouettes, which are not publicly available. Methods Average Rank-1 Results (%)
Our Recent Work (Ours, TIP 2012) 70.07 DNGR (Sarkar’s group, TPAMI 2006) 62.81 Image-to-Class distance (Ours, TCSVT 2010) 61.19 GTDA (Maybank’s group, TPAMI 2007) 60.58 Bilinear Subspace Learning method 2: MMFA (Ours, TIP 2007) 59.9% Bilinear Subspace Learning method 1: CSA + DATER (Ours, TCSVT 2006) 58.5% PCA+LDA (Bhanu’s group, TPAMI 2006) 57.70%
Top ranked results on the benchmark USF HumanID dataset
How to Utilize More Correlations?
Pixel Rearrangement
Sets of highly correlated pixels Columns of highly correlated pixels
Pixel Rearrangement
Potential Assumption in Previous Tensor-based Subspace Learning: Intra-tensor correlations: Correlations among the features within certain
tensor dimensions, such as rows, columns and Gabor features…
- D. Xu, S. Yan et al., T-PAMI 2009
Problem Definition
- The task of enhancing correlation/redundancy among
2nd–order tensor is to search for a pixel rearrangement
- perator R, such that
* 2 , 1
arg min { min || || }
N R T R T i i R U V i
R X UU X VV
- 1. is the rearranged matrix from sample
- 2. The column numbers of U and V are predefined
i
X
R i
X
After pixel rearrangement, we can use the rearranged tensors as input for concurrent subspace analysis
Solution to Pixel Rearrangement Problem
26
Compute reconstructed matrices
1
, 1 1 1 1
n
R R e c T T i n n n i n n
X U U X V V
Optimize operator R
2 , 1
arg min || ||
N R Rec n i i n R i
R X X
Optimize U and V n=n+1 Initialize U0, V0
2 , 1
( , ) arg min || ||
n n
N R R T T n n i i U V i
U V X UU X VV
2 2 , 1 1 1 1 1 1
: || || || ||
n n n
N N R R R Rec T T i i n i n n i n n i i
Note X X X U U X V V
- It is an integer programming problem
How to optimize R
* 2 , 1
arg min || ||
N R Rec i i n R i
R X X
,
min . 1: 0 1; 2 : 1; 3 : 1
pq pq R p q pq pq pq p q
c R st R R R
2 , 1
| ( ) ( ) |
N Rec pq i i n i
where c X p X q
- 1. Linear programming problem has integer solution.
- 2. We constrain the rearrangement within local neighborhood for speedup.
p
Original matrix Reconstructed matrix
q pq
c
Sender Receiver
Convergence Speed
28
Rearrangement Results
29
Reconstruction Visualization
30
Reconstruction Visualization
31
Classification Accuracy
Outline
Dimensionality Reduction for Tensor-based Objects Graph Embedding: A General Framework for Dimensionality Reduction Learning using Privileged Information for Face Verification and Person Re-identification
Representative Previous Work
ISOMAP: Geodesic Distance Preserving
- J. Tenenbaum et al., 2000
LLE: Local Neighborhood Relationship Preserving
- S. Roweis & L. Saul, 2000
LE/LPP: Local Similarity Preserving, M. Belkin, P. Niyogi et al., 2001, 2003 PCA LDA
Dimensionality Reduction Algorithms
- Any common perspective to understand and explain these
dimensionality reduction algorithms? Or any unified formulation that is shared by them?
- Any general tool to guide developing new algorithms for
dimensionality reduction?
Statistics-based Geometry-based PCA/KPCA LDA/KDA … ISOMAP LLE LE/LPP …
Matrix Tensor
Hundreds
Our Answers
Direct Graph Embedding
1
min
T
T y B y
y Ly
Original PCA & LDA, ISOMAP, LLE, Laplacian Eigenmap
Linearization
PCA, LDA, LPP
w X y
T
Kernelization
KPCA, KDA
) (
i i i
x w
Tensorization
CSA, DATER
n n i i
w w w y
2 2 1 1
X
Type Formulation Example
- S. Yan, D. Xu et al., CVPR 2005, T-PAMI 2007
Direct Graph Embedding
1 2
[ , , ..., ]
N
X x x x
1 2
[ , , ..., ]
T N
y y y y
Data in high-dimensional space and low- dimensional space (assumed as 1D space here):
L, B: Laplacian matrix from S, SP;
[ , ]
i ij
G S
x
Intrinsic Graph: Penalty Graph
S, SP: Similarity matrix (graph edge)
[ , ]
P P ij
i
G S
x
,
ii ij j i
L D S D S i
Similarity in high dimensional space
Direct Graph Embedding -- Continued
1 2
[ , , ..., ]
N
X x x x
1 2
[ , , ..., ]
T N
y y y y
* 2 1 1 1 1
arg min || || arg min
i j ij y y
- r
y y
- r
i j y By y B y
y y y S y L y
* 2 1 1
arg min || ||
i j ij y y
- r
i j y By
y y y S
Data in high-dimensional space and low- dimensional space (assumed as 1D space here):
L, B: Laplacian matrix from S, SP;
[ , ]
i ij
G S
x
Criterion to Preserve Graph Similarity: Intrinsic Graph: Penalty Graph
S, SP: Similarity matrix (graph edge) Special case B is Identity matrix (Scale normalization)
[ , ]
P P ij
i
G S
x
Problem: It cannot handle new test data.
,
ii ij j i
L D S D S i
Similarity in high dimensional space
Linearization
y X w
* 1 1
arg min
w w
- r
w XBX w
w w XL X w
Linear mapping function
Objective function in Linearization
Intrinsic Graph Penalty Graph
Problem: linear mapping function is not enough to preserve the real nonlinear structure?
Kernelization
:
i
x F f
the original input space to another higher dimensional Hilbert space. Nonlinear mapping:
( , ) ( ) ( ) k x y x y
( , )
ij i j
K k x x ( )
i i i
w x
* 1 1
arg min
K
- r
KBK
a KLK
Kernel matrix: Constraint:
Objective function in Kernelization
Intrinsic Graph Penalty Graph
) (
i
x
Tensorization
Low dimensional representation is
- btained as:
Objective function in Tensorization
1 2
1 2
...
n
i i n
y w w w X
1 1 2 1 2 1
* 2 1 2 1 2 ( ,..., ) 1
( ,..., ) arg m in || ... ... ||
n n n n
i n j n ij f w w i j
w w w w w w w w S
X X
1 1 2 1 1 2 1 2
2 1 2 1 2 1 2 1 2
( ,..., ) || ... || ( ,..., ) || ... ... ||
n n n n n
N i n ii i P i n j n ij i j
f w w w w w B
- r
f w w w w w w w w S
X X X
where
Intrinsic Graph Penalty Graph
Common Formulation
Tensorization
1 1 2 1 2 1
* 2 1 2 1 2 ( ,..., ) 1
( ,..., ) arg m in || ... ... ||
n n n n
i n j n ij f w w i j
w w w w w w w w S
X X
1 1 2 1 1 2 1 2
2 1 2 1 2 1 2 1 2
( ,..., ) || ... || ( ,..., ) || ... ... ||
n n n n n
N i n ii i P i n j n ij i j
f w w w w w B
- r
f w w w w w w w w S
X X X
where
Linearization Kernelization Direct Graph Embedding
L, B: Laplacian matrix from S, SP; S, SP: Similarity matrix
Intrinsic graph Penalty graph
* 1 1
arg min
w w
- r
w XBX w
w w XL X w
* 1 1
arg min
K
- r
KBK
a KLK
* 1 1
arg min
y y
- r
y By
y y L y
A General Framework for Dimensionality Reduction
Algorithm S & B Definition Embedding Type PCA/KPCA/CSA L/K/T LDA/KDA/DATER L/K/T ISOMAP D LLE D LE/LPP if ; B=D D/L
1
, ;
N ij
S i j B I
1 ,
,
i j i
ij l l l N
S n B I ee
( ) , ;
ij G ij
S D i j B I
; S M M M M B I
2
e x p { || || / }
ij i j
S x x t || ||
i j
x x
D: Direct Graph Embedding L: Linearization K: Kernelization T: Tensorization
General Framework: PCA and LDA
1 ,
;
N ij
W i j B I
* 1 w w
w arg min w C w
1 1 1
( )( ) ( )
i i N N N i
C x x x x X I ee X
* 1 1
1 1
( )( ) ( ) ( )( )
i i
c c c
W W w w B N l l W i i i N c c c
N c c T n c B c W
w S w S w arg min rg min w S w S x x x x x x x x
w w a w Cw X I e e X S n NC S
[1,1,...,1] e
Principal Component Analysis Linear Discriminant Analysis
1 ,
,
i j i
ij l l l N
W n B I ee
General Framework: ISOMAP
( ) , ;
ij G ij
W D i j B I
2
1 1 1 1 1 1 ' ' ' ' 2 ' ' ' 1 1 1 1 ' ' ' ' 2 2 2 ' ' ' ' 2
( ) ( / 2) ( ( ) ( ) / 2) ( ( )) ( ) ( )
G ij ij j j ij N N j ij i j ij i j N N N j i j i ij ij i j i j N N j jj ji jj i N
D HSH I ee S I ee S S S S S S S S
Geodesic Distance Preserving
G
D
Geodesic Distance Matrix Inner Product Matrix
2
( ) / 2 1/
G ij ij ij ij
D HSH S D and H N
Multidimensional Scaling
*
max ( )
T G y
y Arg y D y
General Framework: LLE
; W M M M M B I
[( )( )] ( ) 1 ( ) 1 ( ) 1 ( ) 1
ij ij ij ji ij j j ij ji ki kj j j k ij ji ki kj j k j ij ji ki j k ij ji ki j j k
I M I M I M M M M M M M M M M M M M M M M M M
Local neighborhood relationship preserving
i ij j j
X W X
*
min
T y ij ij ij ji ki kj k
y Arg y M y where M w w w w
2
( ) | |
i ij i j i
y y W y
General Framework: Laplacian Eigenmap/ LPP
2
|| || 2
exp{ }, || ||
- therwise.
i j
i j t ij
W
x x
x x
B=D
2 1
min ( )
T
i j ij y D y ij
y y W
2
|| || 2
exp{ }, || ||
- therwise.
x x
x x
i j
i j t ij
W
1
min
T
T y D y
y Ly
,
ii ij j
L D W D W
T T
XLX XDX w w
T
y X w
Local similarity preserving
Laplacian Eigenmap LPP
New Dimensionality Reduction Algorithm: Marginal Fisher Analysis
ij
S
Important Information for face recognition:
1) Label information 2) Local manifold structure (neighborhood or margin)
1: if xi is among the k1-nearest neighbors of xj in the same class; 0 : otherwise 1: if the pair (i,j) is among the k2 shortest pairs among the data set; 0: otherwise
P ij
S
Marginal Fisher Analysis: Advantage
No Gaussian distribution assumption
Experiments: Face Recognition
PIE-1 G3/P7 G4/P6 PCA+LDA (Linearization) 65.8% 80.2% PCA+MFA (Ours) 71.0% 84.9% KDA (Kernelization) 70.0% 81.0% KMFA (Ours) 72.3% 85.2% DATER-2 (Tensorization) 80.0% 82.3% TMFA-2 (Ours) 82.1% 85.2% ORL G3/P7 G4/P6 PCA+LDA (Linearization) 87.9% 88.3% PCA+MFA (Ours) 89.3% 91.3% KDA (Kernelization) 87.5% 91.7% KMFA (Ours) 88.6% 93.8% DATER-2 (Tensorization) 89.3% 92.0% TMFA-2 (Ours) 95.0% 96.3%
Summary
- Optimization framework that unifies previous
dimensionality reduction algorithms as special cases.
- A new dimensionality reduction algorithm:
Marginal Fisher Analysis.
Outline
Dimensionality Reduction for Tensor-based Objects Graph Embedding: A General Framework for Dimensionality Reduction Learning using Privileged Information for Face Verification and Person Re-identification
- V. Vapnik and A. Vashist, A new learning paradigm: Learning using privileged information.
Neural Networks, 544–557, 2009.
Privileged information: Information available only in the training process (not available in the testing process). Training: attending classes in the classroom Testing: taking an exam Privileged Information: teacher's instruction
Learning using Privileged Information
- racle function
SVM+ (Primal Form) primal form of SVM : main feature : privileged information
Applications in Image and Video Recognition
Testing Data
Azimut 95 Luxury Yacht at the Miami International Boat Show 2012 Azimut- Benetti Yachts sees 20 per cent gain in new luxury yacht sales
caption, tags, keywords, .... Training Data
1) Web images/videos are usually associated with additional surrounding textual descriptions (tags, captions, etc.) 2) RGB-D images from Kinect cameras contain additional depth images
Distance Metric Learning using Privileged Information
- Distance Metric Learning for Face Verification
– Mahalanobis distance:
- Distance Metric Learning using Privileged Information
– Additional depth information is used as privileged information
Similar pair Dissimilar pair Similar or dissimilar? Similar or dissimilar? Similar pair Dissimilar pair Similar or dissimilar? Similar or dissimilar?
Information Theoretic Distance Metric Learning (ITML)
- Distance Metric Learning
– Given training dataset , where is the feature vector and their labels: : two samples belong to the same subject : two samples are from two different subjects
- Information Theoretic Distance Metric Learning
where
Information Theoretic Distance Metric Learning using Privileged Information (ITML+)
- Objective Function
– We learn and simultaneously: where – Discussions:
- For similar pairs, the learned distance should be more similar
- For dissimilar pairs, the learned distance should be more
dissimilar
- X. Xu, W. Li and D. Xu, T-NNLS 2015
Experiments: Face Verification
- Setting:
– Main features (i.e. gradient-LBP features) from RGB images – Privileged information (i.e. gradient-LBP features) from depth images
Results (AP% and AUC%) on the EUROCOM face dataset
Experiments: Person Re-identification
- Setting:
– Main features (i.e. kernel descriptor features) from RGB images – Privileged information (i.e. kernel descriptor features) from depth images
Results (mean Rank-1 recognition rates%) on the BIWI RGBD-ID dataset
Summary
- A new distance metric learning method to