1 Yaniv Taigman 1 Ming Yang 1 MarcAurelio Ranzato 2 Lior Wolf 1 - - PowerPoint PPT Presentation

1 yaniv taigman 1 ming yang 1 marc aurelio ranzato 2 lior
SMART_READER_LITE
LIVE PREVIEW

1 Yaniv Taigman 1 Ming Yang 1 MarcAurelio Ranzato 2 Lior Wolf 1 - - PowerPoint PPT Presentation

DeepFace for Unconstrained Face Recognition 1 Yaniv Taigman 1 Ming Yang 1 MarcAurelio Ranzato 2 Lior Wolf 1 Facebook AI Research 2 Tel Aviv University 11/26/2014 Era of big visual data 1.6M daily uploads 60M daily uploads 6B photos (12/2013)


slide-1
SLIDE 1

DeepFace for Unconstrained Face Recognition

1 Yaniv Taigman 1 Ming Yang 1 Marc’Aurelio Ranzato 2 Lior Wolf

11/26/2014

1 Facebook AI Research 2 Tel Aviv University

slide-2
SLIDE 2

1.6M daily uploads 6B photos (12/2013) 60M daily uploads 20B photos (3/2014) 400M daily uploads 350B photos (3/2014) 350M daily uploads 0B photos (11/2013) 215M daily uploads ?B photos (11/2013) 100 hours video per min (4/2014) Sources: www.expandedramblings.com, www.emarketer.com

  • 1.75B smartphone users in 2014
  • 880B digital photos will be taken in 2014

Era of big visual data

slide-3
SLIDE 3

Tag suggestions

No automatic face recognition service in EU countries

slide-4
SLIDE 4

Facerec main objective

Find a representation & similarity measure such that:

  • Intra-subject similarity is high
  • Inter-subject similarity is low
slide-5
SLIDE 5

1964 Bledsoe Face Recognition 1973 Kanade’s Thesis 1991 Turk & Pentland Eigenfaces 1997 Belhumeur Fisherfaces 1999 Blanz & Vetter Morphable faces 1999 Wiskott EBGM 2001 Viola & Jones Boosting 2006 Ahonen LBP

Milestones in face recognition

Slightly modified version of Anil Jain’s timeline

slide-6
SLIDE 6

NIST FRVT’s best-performer’s on:

  • 1. Verification: FRR=0.3% at FAR=0.1%
  • 2. Identification: with 1.6 million identities: 95.9%
  • 3. Identification: on LFW with 4,249 identities: 56.7%

 Answer: No.

  • L. Best-Rowden, H. Han, C. Otto, B. Klare, and A. K. Jain. Unconstrained face recognition: Identifying

a person of interest from a media collection. IEEE Trans. Information Forensics and Security, 2014.

Problem solved?

slide-7
SLIDE 7

property constrained unconstrained resolution about 2000x2000 50x50 viewpoint fully frontal rotated, loose illumination controlled arbitrary

  • cclusion

disallowed allowed

FRVT

CONSTRAINED

UNCONSTRAINED

Labeled Faces in the Wild

Constrained vs. unconstrained

slide-8
SLIDE 8

Challenges in unconstrained face recognition

Probes for example Gallery

1.Pose 2.Illumination 3.Expression 4.Aging 5.Occlusion

slide-9
SLIDE 9

A case study

  • Gallery images: 1 million mug-shot + 6 web images
  • Probe images: 5 faces
  • Ranking results

– w/o or with demographic filtering

A case study of automated face recognition: the Boston Marathon bombing suspects, J.

  • C. Klontz and A.K. Jain, IEEE Computer, 2013

Probe faces:

slide-10
SLIDE 10

Unconstrained Face Recognition Era: The Labeled Faces in the Wild (LFW)

13,233 photos of 5,749 celebrities

celebrities

Labeled faces in the wild: A database for studying face recognition in unconstrained environments, Huang, Jain, Learned- Miller, ECCVW, 2008

slide-11
SLIDE 11

Face verification (1:1)

=

!=

slide-12
SLIDE 12

Human-level performance

  • User study on Mechanical Turk

– 10 different workers per face pair – Average human performance – Original images, tight crops, inverse crops

Attribute and simile classifiers for face verification, Kumar, et al., ICCV 2009 “These results suggest that automatic face verification algorithms should not use regions outside of the face, as they could artificially boost accuracy in a manner not applicable on real data.”

99.20% 97.53% 94.27%

slide-13
SLIDE 13
  • Labeled faces in the wild: A database for studying face recognition in unconstrained

environments, ECCVW, 2008.

  • Attribute and simile classifiers for face verification, ICCV 2009.
  • Multiple one-shots for utilizing class label information, BMVC 2009.
  • Large scale strongly supervised ensemble metric learning, with applications to face

verification and retrieval, NEC Labs TR, 2012.  Learning hierarchical representations for face verification with convolutional deep belief networks, CVPR, 2012.

  • Bayesian face revisited: A joint formulation, ECCV 2012.
  • Tom-vs-pete classifiers and identity preserving alignment for face verification, BMVC 2012.
  • Blessing of dimensionality: High-dimensional feature and its efficient compression for face

verification, CVPR 2013.

  • Probabilistic elastic matching for pose variant face verification, CVPR 2013.
  • Fusing robust face region descriptors via multiple metric learning for face recognition in the

wild, CVPR 2013.

  • Fisher vector faces in the wild, BMVC 2013.
  • A practical transfer learning algorithm for face verification, ICCV 2013.

 Hybrid deep learning for computing face similarities, ICCV 2013.  Employed deep learning models for face verification on LFW. Please check http://vis-www.cs.umass.edu/lfw/ for the latest updates.

LFW: Progress over the recent 7 years

slide-14
SLIDE 14

LFW: Progress over the recent 7 years

Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments (results page), Gary B. Huang, Manu Ramesh, Tamara Berg and Erik Learned-Miller.

60.02% 73.93% 78.47% 85.54% 88.00% 92.58% 95.17% 96.33% 97.53% 37.08% 19.24% 37.09% 20.52% 48.06% 52.32% 49.15%

Accuracy / year Reduction of error wrt human / year

slide-15
SLIDE 15

High-dim LBP

  • Accurate (27) dense facial landmarks
  • Concatenate multi-scale descriptors

– ~100K-dim LBP, SIFT, Garbor, etc.

  • Transfer learning: Joint Bayesian
  • WDRef dataset

– 99,773 images of 2,995 individuals – 95.17% => 96.33% on LFW (unrestricted protocol)

Face alignment by explicit shape regression, Cao, et al., CVPR 2012 Bayesian face revisited: A joint formulation, Chen, et al., ECCV 2012 Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification, Chen, et al., CVPR 2013 A practical transfer learning algorithm for face verification, Cao, et al., ICCV 2013

Likelihood ratio test: EM update of the between/within class covariance

slide-16
SLIDE 16

Hybrid deep learning

  • 12X5 Siamese ConvNets X8 + RBM classification

Hybrid deep learning for computing face similarities, Sun, Wang, Tang, ICCV 2013.

CelebFaces dataset

87,628 images of 5,436 individuals 12 face regions 8 pairs of inputs

slide-17
SLIDE 17

Detect Align Represent Classify

Face recognition pipeline

Yaniv Lubomir Marc’Aurelio

slide-18
SLIDE 18

Faces are 3D objects

slide-19
SLIDE 19

Reconstruction accuracy and discriminability

Bornstein et al. 2007

slide-20
SLIDE 20

Face alignment

(‘Frontalization’)

Detect 2D-Aligned 3D-Aligned

slide-21
SLIDE 21

2D alignment

2D Align Localize

f

slide-22
SLIDE 22

3D alignment

2D Align

+67 x2d Pnts

Piece-wise affine

slide-23
SLIDE 23

Rendering of new views

slide-24
SLIDE 24

Network architecture

Localization Front-End ConvNet Local (Untied) Convolutions

Globally Connected

C1: 32 filters 11x11 M2: 3x3 C3: 16 filters 9x9 REPRES ESENTAT ENTATION ION SFC labels

Calista_Flockhart_0002.jpg Detection & Localization

Frontalization L4: 16 x 9 x 9 x 16 L5: 16 x 7 x 7 x 16 L6: 16 x 5 x 5 x 16 F7: 4096d F8: 4030d

slide-25
SLIDE 25

SFC Training dataset 4.4 million photos blindly sampled, containing more than 4,000 identities (permission granted)

slide-26
SLIDE 26

Transferred Similarities (Test)

DeepFace Replica DeepFace Replica

(a) Cosine angle (b) Kernel Methods (c) Siamese Network

slide-27
SLIDE 27

Results on LFW

slide-28
SLIDE 28

Youtube face dataset (YTF)

Face recognition in unconstrained videos with matched background similarity, Wolf, Hassner, Maoz, ICCV 2011

  • Data collection

– 3,425 Youtube videos 1,595 celebrities (a subset of LFW subjects) – 5,000 video pairs in 10 splits – Detected and roughly aligned face frames available.

  • Metric: mean recognition accuracy over 10 folds

– Restricted protocol: only same/not-same labels – Unrestricted protocol: face identities, additional training pairs

slide-29
SLIDE 29

Results on YouTube Faces (Video)

slide-30
SLIDE 30

Trade-offs

  • 1. Alignment:
  • 2. Dimensionality:
  • 3. Sparsity @ 4k dims:

not “astonishing”

87.9 93.7 94.3 97.35 91.3 No Alignment 3D Pertrubation 2D Alignment 3D Alignment 3D Alignment + LBP (LFW Acc. %)

97 96.07 96.72 95.53 97.17 95.87

4096 4096 bits 1024 1024 bits 256 256 bits

0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 7 0.8 0.9 1

slide-31
SLIDE 31

Trade-offs – Cont’d

  • 4. Training data size:
  • 5. Network Architecture:

8.74 10.9 15.1 20.7

100% of the data 50% of the data 20% of the data 10% of the data DB Size / DNN Test Error (%)

8.74 11.2 12.6 13.5

C1+M2+C3+L4+L5+L6+F7

  • C3
  • L4 -L5
  • C3 -L4 -L5
slide-32
SLIDE 32

Failure cases

  • All false negatives on LFW (1%)

age

sunglasses

  • cclusion/

hats profile errata

slide-33
SLIDE 33

Failure cases

  • All false positive on LFW (0.65%)
slide-34
SLIDE 34

Failure cases

  • Sample false negatives on YTF
slide-35
SLIDE 35

Failure cases

  • Sample false positives on YTF
slide-36
SLIDE 36

Face identification (1:N)

Probe

Gallery

Unaccounted challenges in verification: I.Reliability II.Large confusion (P x G) III.Different distributions IV.Unknown class

=

!=

slide-37
SLIDE 37

LFW identification (1:N) protocols2

  • 1. Close Set
  • #Gallery1:

4,249

  • #Probes:

3,143 Measured3 by Rank-1 rate.

  • 2. Open Set
  • #Gallery1:

596

  • #Probes:

596

  • #Impostors:

9,491 (‘unknown class’) Measured3 by Rank-1 rate @ 1% False Alarm Rate.

1 Each identity with a single example 2 Unconstrained Face Recognition: Identifying a Person of Interest from a Media Collection

Best-Rowden, Han, Otto, Klare and Jain (IEEE Trans. Information Forensics and Security,)

3 Training is not permitted on LFW (‘unsupervised’)

Gallery

Probe

UNKNOWN

Impostor Probe

slide-38
SLIDE 38

LFW identification (1:N) results

Gallery

Probe

UNKNOWN

Impostor Probe

Cosine similarity measure (‘unsupervised’) : Confusion Matrix = GT*P G is 4096x 4249 P is 4096x 3143

G P NIST’s

slide-39
SLIDE 39

Bottleneck regularizes transfer learning

FC8

0 0 0 0 0 0 0 0 1 0 0

FC7

SOFTMAX

DNN

Web-Scale Training for Face Identification; Taigman, Yang, Ranzato, Wolf

Labels

slide-40
SLIDE 40

Low-dim DeepFace representation

  • Naïve binarization

97 96.72 96.78 97.17 96.42 96.1 94.5 92.75 89.4 96.07 95.53 95.5 95.87 93.38 91.45 87.15

85 87 89 91 93 95 97 dim=4096 dim=1024 dim=512 dim=256 dim=128 dim=64 dim=32 dim=16 dim=8

Verification accuracy (%) on LFW (restricted protocol)

float binary

slide-41
SLIDE 41

CNN’s (can) saturate

“Results can be improved simply by waiting for faster GPUs and bigger datasets to become available” -- Krizhevsky et al.

What happens when the network is fixed & the number of training grows from 4m  0.5b ? Answer: our findings reveal that this holds to a certain degree.

slide-42
SLIDE 42

Data is practically infinite.

slide-43
SLIDE 43

– >350 billion photos – >400M photos uploaded/day – 3500 photos every sec – One ImageNet every 1:20h – One Flickr every 4 weeks

Data is practically infinite.

slide-44
SLIDE 44

Scaling up

DeepFace : 4.4 million images / 4,030 identities Random 108k : 6 million images / 108,000 identities Random 250k : 10 million images / 250,000 identities (yes : 250K softmax)

 Saturation

slide-45
SLIDE 45

Scaling up: Semantic Bootstrapping

  • 0.5B images  10M hyperplanes
  • Lookalike hyperplanes  DB2
  • Training on DB2 with more capacity.

Web-Scale Training for Face Identification; Taigman, Yang, Ranzato, Wolf

slide-46
SLIDE 46

Second round results

Results on LFW

slide-47
SLIDE 47

Comparison to NIST’s State Of The Art

Same system that achieved 92% Rank-1 accuracy on a table

  • f 1.6 million identities.

(NIST’s SOTA, constrained) Second-round DeepFace SOTA single network, 2nd best 95.43%[DeepID2]

slide-48
SLIDE 48
  • For a single 720p image on a single 2.2Ghz Intel CPU core:

–Face detection: 0.3 sec –2D+ 3D Alignment: 0.05 sec –Feed-forward: 0.18 sec –Classification: ~0 sec –Overall: 0.53 sec

DeepFace efficiency (at test)

slide-49
SLIDE 49

Summary

  • Coupling 3D alignment with large-capacity

locally-connected networks

  • At the brink of human-level performance for

face verification (1:1)

  • Pushing the performance significantly for face

identification (1:N)

slide-50
SLIDE 50

(a small part of-) tSNE visualization of LFW, constructed from all pairs (~88m) dot products, i.e. unsupervised.

slide-51
SLIDE 51

Thank you!

  • Questions
  • Comments
  • Suggestions
  • Facebook AI Research: www.facebook.com/fair

– Human attributes – Object recognition – NLP/word embedding – GPU training platform – ……

slide-52
SLIDE 52

Recent work

  • 25 CNNs combining identification and verification loss

Deep learning face representation by joint identification-verification, Sun, Wang, Tang, technical report, arxiv, 6/2014 Learning deep face representation, Fan, Cao, Jiang, Yin, technical report, arxiv, 3/2014

  • Pyramid CNN: a group of

layer-wised trained CNNs.