Pavlo Molchanov Stephen Tyree Jan Kautz
Model Architectures and Training Techniques for High-Precision Landmark Localization
Sina Honari Jason Yosinski Pascal Vincent Christopher Pal
Model Architectures and Training Techniques for High-Precision - - PowerPoint PPT Presentation
Model Architectures and Training Techniques for High-Precision Landmark Localization Sina Honari Pavlo Molchanov Jason Yosinski Stephen Tyree Pascal Vincent Jan Kautz Christopher Pal KEYPOINT DETECTION / LANDMARK LOCALIZATION The problem
Pavlo Molchanov Stephen Tyree Jan Kautz
Sina Honari Jason Yosinski Pascal Vincent Christopher Pal
2
Keypoints for human face can be:
Applications include:
3
Robustness Precision Precision False positives
4
Sums features of different granularity (FCN[1], HyperColumn[2])
[1] J. Long, E. Shelhamer and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015 [2] B.Hariharan,P.Arbela ́ez,R.Girshick,andJ.Malik.Hyper- columns for object segmentation and fine-grained localization. In CVPR, 2015. C=Convolution P=Pooling U= Upsampling Branch=horizontal C, U layers
5
Sum of Branches
6
Sum of Branches
C=Convolution P=Pooling U= Upsampling K= Concatenation Branch=horizontal C, U layers
Error=Euclidean distance between GT & predicted key-points normalized by inter-ocular distance
15 15
Sina Honari Jan Kautz Pascal Vincent Christopher Pal Pavlo Molchanov Stephen Tyree
16 16
Labeling: ~60s
Labeling: ~1s
Manual landmark localization is a tedious task (to build datasets)
17 17
Labeling: ~60s
Labeling: ~1s
Manual landmark localization is a tedious task (to build datasets)
19 19
1. First predict landmarks 2. Use predicted landmarks to predict attribute
20 20
Forward pass: landmarks help attribute Backward pass
1. First predict landmarks 2. Use predicted landmarks to predict attribute 3. Get gradient from attribute to the landmark localization network
21 21
Soft-argmax estimates location of the scaled centrum of mass: ü Continuous, not discrete ü Differentiable
22 22
23 23
24 24
25
just CNN
Method Training data
100%
26
just CNN just CNN
Method Training data
100% 5%
27
Training data
100% 5% 5% just CNN semi supervised CNN just CNN
Method
28
5.43 4.35 4.25 4.05 3.92 3.73 2.72 2.17 2.88 2.46 2.03 1.59
1 2 3 4 5 6 C D M E R T L B F S D M C F S S R C P R C C L L v e t a l . O u r O u r + S S L O u r + S S L O u r + S S L
100% labeled data 1% 5% 100% AFLW dataset:
Error metric
29
5.43 4.35 4.25 4.05 3.92 3.73 2.72 2.17 2.88 2.46 2.03 1.59
1 2 3 4 5 6 C D M E R T L B F S D M C F S S R C P R C C L L v e t a l . O u r O u r + S S L O u r + S S L O u r + S S L
100% labeled data 1% 5% 100% AFLW dataset:
Error metric
30
5.43 4.35 4.25 4.05 3.92 3.73 2.72 2.17 2.88 2.46 2.03 1.59
1 2 3 4 5 6 C D M E R T L B F S D M C F S S R C P R C C L L v e t a l . O u r O u r + S S L O u r + S S L O u r + S S L
100% labeled data 1% 5% 100% AFLW dataset:
Error metric
31
5.43 4.35 4.25 4.05 3.92 3.73 2.72 2.17 2.88 2.46 2.03 1.59
1 2 3 4 5 6 C D M E R T L B F S D M C F S S R C P R C C L L v e t a l . O u r O u r + S S L O u r + S S L O u r + S S L
100% labeled data 1% 5% 100% AFLW dataset:
Error metric
32
RCN+ (L+ELT+A) 100% RCN+ (L+ELT+A) 1% RCN+ (L) 1%
33
and CVPR2018: https://arxiv.org/abs/1709.01591
Pavlo Molchanov pmolchanov@nvidia.com Sina Honari honaris@iro.umontreal.ca
Mask: 0 branch is omitted, 1 branch in included.
SumNet RCN Mask AFLW AFW AFLW AFW coarse → fine 1, 0, 0, 0 10.54 10.63 10.61 10.89 0, 1, 0, 0 11.28 11.43 11.56 11.87 1, 1, 0, 0 9.47 9.65 9.31 9.44 0, 0, 1, 0 16.14 16.35 15.78 15.91 0, 0, 0, 1 45.39 47.97 46.87 48.61 0, 0, 1, 1 13.90 14.14 12.67 13.53 0, 1, 1, 1 7.91 8.22 7.62 7.95 1, 0, 0, 1 6.91 7.51 6.79 7.27 1, 1, 1, 1 6.44 6.78 6.37 6.43
SumNet RCN Coarse to Fine Coarse to Fine
SumNet RCN Coarse to Fine Coarse to Fine