De Deep Learning fo for Face ce Analysis
Chen-Change LOY MMLAB The Chinese University of Hong Kong
Homepage: http://personal.ie.cuhk.edu.hk/~ccloy/
De Deep Learning fo for Face ce Analysis Chen-Change LOY MMLAB - - PowerPoint PPT Presentation
De Deep Learning fo for Face ce Analysis Chen-Change LOY MMLAB The Chinese University of Hong Kong Homepage : http://personal.ie.cuhk.edu.hk/~ccloy/
Chen-Change LOY MMLAB The Chinese University of Hong Kong
Homepage: http://personal.ie.cuhk.edu.hk/~ccloy/
https://www.youtube.com/watch?v=k3T2WbRkgvg&index=4&list=PLkNuzPSJx0mO0_mLUjDQFXFgngTV7QwHZ
Vivo X20 Face Wake: unlock your mobile phone in 0.1 seconds
DeepID3 99.55% DeepID2 99.15% GaussianFace 98.52%
GaussianFace", Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI),, January
Human accuracy 97.45%
Papers
Training set DeepID2: 200K images Now: 2 billion images in total, 200M individualsβ faces 1:1 result DeepID2 (2014): 99.5% accuracy @ 0.5% FAR 6 digit password (2015): >90% accuracy @10^-6 FAR 8 digit password (2017): >97% accuracy @10^-8 FAR 1:N result DeepID2: top 30 < 40% for N = 100M Now: top 30 > 90% for N = 100M
Industry Breakthrough
2015
Yang et al., From Facial Part Responses to Face Detection: A Deep Learning Approach, ICCV 2015
Zhang et al., S 3FD: Single Shot Scale-invariant Face Detector, ICCV 2017
2017
Pose-Robust Face Recognition via Deep Residual Equivariant Mapping
A submission to CVPR 2018
in face recognition
Profile faces of different persons are easily to be mismatched (false positives), and profile and frontal faces of the same identity may not trigger a match leading to false negatives
data size
learned features tend to bias on distinguishing frontal faces rather than profile faces.
Zhu et al. High-Fidelity Pose and Expression Normalization for Face Recognition in the Wild, CVPR 2015
Model Input Generated Real
We can map profile face feature to the frontal space through a mapping function that adds residual.
input image
an input image to achieve the desired transformation
maps an image π¦ β π to a vector π(π¦) β π(
image if the transformation can be transferred to the representation output βπ¦ β π: π(ππ¦) β π/π(π¦)
image π2
a mapping function π/ , so that π/ π(π2) β π(π1) π/ π(π2) = π(π2) + π΅(π2)β(π2) β π(π1)
residual function yaw coefficient, [0 1], a soft gate of the residuals
from the frontal pose
from frontal to a complete profile
information (the yaw in our case) to influence the feed-forward process
The Deep Residual EquivAriant Mapping (DREAM) block
stem CNN
respectively to the common feature space.
network
7.26 7.82
space
computational overhead.
MIT+CMU FDDB WIDER FACE
468 507 1335 5171 11931 49759 393703
50000 100000 150000 200000 250000 300000 350000 400000 AFW MIT+CMU PASCAL FACE FDDB MALF IJB-A WIDER FACE
Number of labeled faces
507 1335 2808 5171 49759 95448 393703β6=2362218
500000 1000000 1500000 2000000 2500000 MIT+CMU PASCAL FACE AFW FDDB IJB-A MALF WIDER FACE
Number of annotations
0.2 0.4 0.6 0.8 1
Detection Rate
Rich events
Traffic
0.2 0.4 0.6 0.8 1
Detection Rate
Rich events
Students Schoolkids
0.2 0.4 0.6 0.8 1
Detection Rate
Rich events
Handshaking
Occlusion Illumination Expression Pose Blur Normal Extreme Intermediate
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2000 4000 6000 8000 10000
AFW
Detection Rate
Proposals/per image
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2000 4000 6000 8000 10000
AFW PASCAL FACE
Detection Rate
Proposals/per image
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2000 4000 6000 8000 10000
AFW PASCAL FACE FDDB
Detection Rate
Proposals/per image
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2000 4000 6000 8000 10000
AFW PASCAL FACE FDDB IJB-A
Detection Rate
Proposals/per image
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2000 4000 6000 8000 10000
AFW PASCAL FACE FDDB IJB-A WIDER FACE Hard WIDER FACE Medium WIDER FACE Easy
Detection Rate
Proposals/per image
Webpage: http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/
Average precision FAN β 0.946 Face R-FCN β 0.943 SFD - 0.935 β¦ 2015 method - 0.711 Average precision FAN β 0.936 Face R-FCN β 0.931 SFD - 0.921 β¦ 2015 method - 0.636 Average precision FAN β 0.885 Face R-FCN β 0.876 SFD - 0.858 β¦ 2015 method - 0.400
Face Detection through Scale-Friendly Deep Convolutional Networks
https://arxiv.org/pdf/1706.02863.pdf, 2017
different than those for recognizing a 10-pixels tall face
that can distinguish faces with large appearance variations
convolution operations
inherent visual cues and thus lead to disparate detection difficulties
Input image Proposal network1 Proposal network2 Proposal network3 Proposal network4 Detection network1 Detection network2 Detection network3 Detection network4 Final results 10-30 Pixels 30-120 Pixels 120-240 Pixels 240-480 Pixels 30Γ30 Pixels 120Γ120 Pixels 240Γ240 Pixels 480Γ480 Pixels
Multiscale proposal networks Response maps Proposals Multiscale detection networks Detection results
Stage 1 Stage 2
depth and spatial pooling to optimize the receptive field for the particular range
Ren et al., Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NIPS 2015 Fixed size feature maps for each ROI
Contains three scale-variant detectors with different size of spatial pooling stride and depth Scale-variant detectors are integrated into a single backbone network by sharing representation (ResNet-50) Single-scale inference -- using a single input image without an image pyramid.
Given a test image, a forward pass is performed and each scale-variant face detector will generate detection windows independently
spatial pooling structure Experiment
structure.
feature map is close to the ROI template
tend to have smaller projected ROI size
scale consistently decreases when the ROI on the target layer is smaller than ROI pooling size
network which will generally improve the discriminative power of the feature representation, the detection performance still drops
The green box represents the ROI template
detection performance.
than the ROI template discriminative information will loss during pooling procedure.
region is much smaller than the ROI template, the insufficient information and overlapping between features will cause a performance drop.
poses looking at different directions.
Input image Proposal network1 Proposal network2 Proposal network3 Proposal network4 Detection network1 Detection network2 Detection network3 Detection network4 Final results 10-30 Pixels 30-120 Pixels 120-240 Pixels 240-480 Pixels 30Γ30 Pixels 120Γ120 Pixels 240Γ240 Pixels 480Γ480 Pixels
Multiscale proposal networks Response maps Proposals Multiscale detection networks Detection results
Stage 1 Stage 2
small number of hard examples.
and efficient
distribution that depends on the current loss of each example under consideration
Shrivastava et al., Training Region-based Object Detectors with Online Hard Example Mining, CVPR 2016
Shrivastava et al., Training Region-based Object Detectors with Online Hard Example Mining, CVPR 2016
Tested using NVIDIA Titan X GPU by averaging the runtime of 1, 000 images randomly sampled from the WIDER FACE dataset Evaluation of different range partitioning schemes across three difficulty settings of WIDER FACE (Easy, Medium, Hard)
Learning Deep Representation for Imbalanced Classification
in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016 Code available: http://mmlab.ie.cuhk.edu.hk/projects/LMLE.html
200K celebrity images, each with 40 attribute Liu et al. βDeep Learning Face Attributes in the Wildβ, ICCV 2015 http://mmlab.ie.cuhk.edu. hk/projects/CelebA.html
xo
Linear SVM
Smiling Wavy Hair No Beard High Cheekbones
h
β¦ β¦
xs xf
(a) LNeto (b) LNets (c) ANet (d) Extracting features to predict attributes m n
(5)
s
(5 (5)
h
f (4 (4)
h
FC FC FC FC FC FC
FC FC y Linear SVM Linear SVM
xf
Liu et al. βDeep Learning Face Attributes in the Wildβ, ICCV 2015
=2 > =? @2 > @?
B =2 @2 + =? @?
ππ and ππ are the numbers of positive and negative samples, while π’π and π’π are the numbers of true positive and true negative.
imbalanced class issue
toward the majority class
the minority class
CelebA positive/negative distribution
Remove valuable information
Introduce artificial noise
How to design costs?
Minority class: very few instances with high degree of visual variability The genuine neighborhood
be invaded by other imposter nearest neighbors Can we introduce tighter constrains to ameliorate such invasion?
2 β a positive instance (of the same class)
? β a negative instance (different class) Class 1 minority Class 2 majority
xi xp
i
xn
i
Wearing hat Not wearing hat
2 4 6 8
1 2 3 4 5
5 10
5 10 15
2 4 6 8
2 4 6 8 NC2 NC3 NC4 NC5 PC1 PC2 NC1
2 4 6 8
1 2 3 4 5
5 10
5 10 15
2 4 6 8
2 4 6 8 NC2 NC3 NC4 NC5 PC1 PC2 NC1
2 4 6 8
1 2 3 4 5
5 10
5 10 15
2 4 6 8
2 4 6 8 NC2 NC3 NC4 NC5 PC1 PC2 NC1
Class 1: cluster 1 Class 1: cluster 2 Class 2: cluster 1 Class 2: cluster 2 Class 2: cluster 3 Class 2: cluster 4 Class 2: cluster 5
Features extracted from DeepID2 model Triplet embedding
2D feature embedding of one imbalanced binary face attribute
between classes
2 4 6 8
1 2 3 4 5
5 10
5 10 15
2 4 6 8
2 4 6 8 NC2 NC3 NC4 NC5 PC1 PC2 NC1
2 4 6 8
1 2 3 4 5
5 10
5 10 15
2 4 6 8
2 4 6 8 NC2 NC3 NC4 NC5 PC1 PC2 NC1
2 4 6 8
1 2 3 4 5
5 10
5 10 15
2 4 6 8
2 4 6 8 NC2 NC3 NC4 NC5 PC1 PC2 NC1
Class 1: cluster 1 Class 1: cluster 2 Class 2: cluster 1 Class 2: cluster 2 Class 2: cluster 3 Class 2: cluster 4 Class 2: cluster 5
2 4 6 8
1 2 3 4 5
5 10
5 10 15
2 4 6 8
2 4 6 8 NC2 NC3 NC4 NC5 PC1 PC2 NC1
2 4 6 8
1 2 3 4 5
5 10
5 10 15
2 4 6 8
2 4 6 8 NC2 NC3 NC4 NC5 PC1 PC2 NC1
Features extracted from DeepID2 model Triplet embedding
2D feature embedding of one imbalanced binary face attribute
Our solution
Learn a Euclidean embedding π(π¦) from an image π¦ into a feature space β(, such that the embedded features are discriminative with minimal possible local class imbalance.
CNN CNN CNN CNN CNN Triple-header hinge loss Mini- batches Training samples β¦ Embedding Quintuplet xpββ
i
xi xp+
i
xpβ
i
xn
i
f(xi) f(xp+
i )
f(xpβ
i
) f(xpββ
i
) f(xn
i )
Shared parameters
2> β the anchorβs most distant within-
cluster neighbor
2J β the nearest within-class neighbor of
the anchor, but from a different cluster
2JJ β the most distant within-class
neighbor of the anchor
? β the nearest between-class neighbor
Class 1 minority Class 2 majority β¦ Cluster 1 Cluster j Cluster 1 Cluster 2
xi xp+
i
xpβ
i
xpββ
i
xn
i
< D(f(xi), f(xn
i )) Class 1 minority Class 2 majority β¦ Cluster 1 Cluster j Cluster 1 Cluster 2
xi xp+
i
xpβ
i
xpββ
i
xn
i
< D(f(xi), f(xpβ
i
)) < D(f(xi), f(xpββ
i
)) D(f(xi), f(xp+
i ))
> > >
D(f(xi), f(xj)) = kf(xi) f(xj)k2
2 is the Euclidean distance
image similarity
features
iterations
min X
i
(Ξ΅i + Οi + Οi) + Ξ»kWk2
2
max
i )) β D(f(xi), f(xpβ i
))
max
i
)) β D(f(xi), f(xpββ
i
))
βi, Ξ΅i β₯ 0, Οi β₯ 0, Οi β₯ 0 max
i
)) β D(f(xi), f(xn
i ))
s.t.:
< D(f(xi), f(xn
i ))
< D(f(xi), f(xpβ
i
)) < D(f(xi), f(xpββ
i
)) D(f(xi), f(xp+
i ))
> > >
min X
i
(Ξ΅i + Οi + Οi) + Ξ»kWk2
2
max
i )) β D(f(xi), f(xpβ i
))
max
i
)) β D(f(xi), f(xpββ
i
))
βi, Ξ΅i β₯ 0, Οi β₯ 0, Οi β₯ 0 max
i
)) β D(f(xi), f(xn
i ))
s.t.:
R2 space Class 2 Class 1 Class c g1 g2 g3
min X
i
(Ξ΅i + Οi + Οi) + Ξ»kWk2
2
max
i )) β D(f(xi), f(xpβ i
))
max
i
)) β D(f(xi), f(xpββ
i
))
βi, Ξ΅i β₯ 0, Οi β₯ 0, Οi β₯ 0 max
i
)) β D(f(xi), f(xn
i ))
s.t.:
clusters
CNN CNN CNN CNN CNN Triple-header hinge loss Mini- batches Training samples β¦ Embedding Quintuplet xpββ
i
xi xp+
i
xpβ
i
xn
i
f(xi) f(xp+
i )
f(xpβ
i
) f(xpββ
i
) f(xn
i )
cluster & class membership
from each class
CNN to compute loss
Feature-based clustering Feature learning/updating
Every 5000 iterations
exemplar, and perform a fast cluster-wise kNN search.
Let π(π) be query π's local neighborhood defined by its kNN cluster centroids {πG}GJA
P
yq = arg max
c=1,...,C
  ο£ min
mj2Ο(q) yj6=c
D(f(q), f(mj)) β max
mi2Ο(q) yi=c
D(f(q), f(mi)) ο£Ά ο£· ο£Έ
Class imbalance level (= |positive class rate-50|%) Anet classification accuracy = 87.24%, balance accuracy = 80.02% Ours classification accuracy = 90.35%, balance accuracy = 84.25%
5 10 15 20 10 20 30 40 50
Relative accuracy gain (%) Class imbalance level (%)
Face attribute Over PANDA [32] Over Triplet-kNN [22] M
e i m b a l a n c e d
10 20 30 40 10 20 30 40 50
Relative accuracy gain (%) Class imbalance level (%)
Face attribute M
e i m b a l a n c e d Over Anet [28] Over PANDA [46] Over Triplet-kNN [33]