DeepFace for Unconstrained Face Recognition
1 Yaniv Taigman 1 Ming Yang 1 Marc’Aurelio Ranzato 2 Lior Wolf
11/26/2014
1 Facebook AI Research 2 Tel Aviv University
1 Yaniv Taigman 1 Ming Yang 1 MarcAurelio Ranzato 2 Lior Wolf 1 - - PowerPoint PPT Presentation
DeepFace for Unconstrained Face Recognition 1 Yaniv Taigman 1 Ming Yang 1 MarcAurelio Ranzato 2 Lior Wolf 1 Facebook AI Research 2 Tel Aviv University 11/26/2014 Era of big visual data 1.6M daily uploads 60M daily uploads 6B photos (12/2013)
1 Yaniv Taigman 1 Ming Yang 1 Marc’Aurelio Ranzato 2 Lior Wolf
11/26/2014
1 Facebook AI Research 2 Tel Aviv University
1.6M daily uploads 6B photos (12/2013) 60M daily uploads 20B photos (3/2014) 400M daily uploads 350B photos (3/2014) 350M daily uploads 0B photos (11/2013) 215M daily uploads ?B photos (11/2013) 100 hours video per min (4/2014) Sources: www.expandedramblings.com, www.emarketer.com
No automatic face recognition service in EU countries
1964 Bledsoe Face Recognition 1973 Kanade’s Thesis 1991 Turk & Pentland Eigenfaces 1997 Belhumeur Fisherfaces 1999 Blanz & Vetter Morphable faces 1999 Wiskott EBGM 2001 Viola & Jones Boosting 2006 Ahonen LBP
Slightly modified version of Anil Jain’s timeline
a person of interest from a media collection. IEEE Trans. Information Forensics and Security, 2014.
property constrained unconstrained resolution about 2000x2000 50x50 viewpoint fully frontal rotated, loose illumination controlled arbitrary
disallowed allowed
FRVT
CONSTRAINED
UNCONSTRAINED
Labeled Faces in the Wild
Probes for example Gallery
– w/o or with demographic filtering
A case study of automated face recognition: the Boston Marathon bombing suspects, J.
Probe faces:
Labeled faces in the wild: A database for studying face recognition in unconstrained environments, Huang, Jain, Learned- Miller, ECCVW, 2008
– 10 different workers per face pair – Average human performance – Original images, tight crops, inverse crops
Attribute and simile classifiers for face verification, Kumar, et al., ICCV 2009 “These results suggest that automatic face verification algorithms should not use regions outside of the face, as they could artificially boost accuracy in a manner not applicable on real data.”
99.20% 97.53% 94.27%
environments, ECCVW, 2008.
verification and retrieval, NEC Labs TR, 2012. Learning hierarchical representations for face verification with convolutional deep belief networks, CVPR, 2012.
verification, CVPR 2013.
wild, CVPR 2013.
Hybrid deep learning for computing face similarities, ICCV 2013. Employed deep learning models for face verification on LFW. Please check http://vis-www.cs.umass.edu/lfw/ for the latest updates.
Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments (results page), Gary B. Huang, Manu Ramesh, Tamara Berg and Erik Learned-Miller.
60.02% 73.93% 78.47% 85.54% 88.00% 92.58% 95.17% 96.33% 97.53% 37.08% 19.24% 37.09% 20.52% 48.06% 52.32% 49.15%
Accuracy / year Reduction of error wrt human / year
– ~100K-dim LBP, SIFT, Garbor, etc.
– 99,773 images of 2,995 individuals – 95.17% => 96.33% on LFW (unrestricted protocol)
Face alignment by explicit shape regression, Cao, et al., CVPR 2012 Bayesian face revisited: A joint formulation, Chen, et al., ECCV 2012 Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification, Chen, et al., CVPR 2013 A practical transfer learning algorithm for face verification, Cao, et al., ICCV 2013
Likelihood ratio test: EM update of the between/within class covariance
Hybrid deep learning for computing face similarities, Sun, Wang, Tang, ICCV 2013.
87,628 images of 5,436 individuals 12 face regions 8 pairs of inputs
Detect Align Represent Classify
Yaniv Lubomir Marc’Aurelio
Bornstein et al. 2007
2D Align Localize
f
2D Align
+67 x2d Pnts
Piece-wise affine
Localization Front-End ConvNet Local (Untied) Convolutions
Globally Connected
C1: 32 filters 11x11 M2: 3x3 C3: 16 filters 9x9 REPRES ESENTAT ENTATION ION SFC labels
Calista_Flockhart_0002.jpg Detection & Localization
Frontalization L4: 16 x 9 x 9 x 16 L5: 16 x 7 x 7 x 16 L6: 16 x 5 x 5 x 16 F7: 4096d F8: 4030d
DeepFace Replica DeepFace Replica
Face recognition in unconstrained videos with matched background similarity, Wolf, Hassner, Maoz, ICCV 2011
– 3,425 Youtube videos 1,595 celebrities (a subset of LFW subjects) – 5,000 video pairs in 10 splits – Detected and roughly aligned face frames available.
– Restricted protocol: only same/not-same labels – Unrestricted protocol: face identities, additional training pairs
not “astonishing”
87.9 93.7 94.3 97.35 91.3 No Alignment 3D Pertrubation 2D Alignment 3D Alignment 3D Alignment + LBP (LFW Acc. %)
97 96.07 96.72 95.53 97.17 95.87
4096 4096 bits 1024 1024 bits 256 256 bits
0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 7 0.8 0.9 1
8.74 10.9 15.1 20.7
100% of the data 50% of the data 20% of the data 10% of the data DB Size / DNN Test Error (%)
8.74 11.2 12.6 13.5
C1+M2+C3+L4+L5+L6+F7
age
sunglasses
hats profile errata
Probe
Gallery
1 Each identity with a single example 2 Unconstrained Face Recognition: Identifying a Person of Interest from a Media Collection
Best-Rowden, Han, Otto, Klare and Jain (IEEE Trans. Information Forensics and Security,)
3 Training is not permitted on LFW (‘unsupervised’)
Gallery
Probe
…
UNKNOWN
Impostor Probe
Gallery
Probe
…
UNKNOWN
Impostor Probe
G P NIST’s
SOFTMAX
Web-Scale Training for Face Identification; Taigman, Yang, Ranzato, Wolf
Labels
97 96.72 96.78 97.17 96.42 96.1 94.5 92.75 89.4 96.07 95.53 95.5 95.87 93.38 91.45 87.15
85 87 89 91 93 95 97 dim=4096 dim=1024 dim=512 dim=256 dim=128 dim=64 dim=32 dim=16 dim=8
Verification accuracy (%) on LFW (restricted protocol)
float binary
– >350 billion photos – >400M photos uploaded/day – 3500 photos every sec – One ImageNet every 1:20h – One Flickr every 4 weeks
Web-Scale Training for Face Identification; Taigman, Yang, Ranzato, Wolf
Same system that achieved 92% Rank-1 accuracy on a table
(NIST’s SOTA, constrained) Second-round DeepFace SOTA single network, 2nd best 95.43%[DeepID2]
(a small part of-) tSNE visualization of LFW, constructed from all pairs (~88m) dot products, i.e. unsupervised.
Deep learning face representation by joint identification-verification, Sun, Wang, Tang, technical report, arxiv, 6/2014 Learning deep face representation, Fan, Cao, Jiang, Yin, technical report, arxiv, 3/2014