Supervision-by-Registration: An Unsupervised Approach to Improve the - - PowerPoint PPT Presentation

supervision by registration an
SMART_READER_LITE
LIVE PREVIEW

Supervision-by-Registration: An Unsupervised Approach to Improve the - - PowerPoint PPT Presentation

Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors Xuanyi Dong 1 , Shoou-I Yu 2 , Xinshuo Weng 2 , Shih-En Wei 2 , Yi Yang 1 , Yaser Sheikh 2 1 Cai University of Technology Sydney, 2 Oculus


slide-1
SLIDE 1

Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors

Xuanyi Dong1, Shoou-I Yu2, Xinshuo Weng2, Shih-En Wei2, Yi Yang1, Yaser Sheikh2

1Cai University of Technology Sydney, 2Oculus Research, Facebook

CVPR 2018, Salt Lake City

slide-2
SLIDE 2

Facial Landmark Detection

slide-3
SLIDE 3

A Challenging Problem

Poses

(expressions/viewpoints)

Sagonas et al. 300 Faces in-the-Wild Challenge: The first facial landmark localization Challenge. ICCV, 2013.

Identity Temporal Consistency

slide-4
SLIDE 4

Landmark Detection Methods

Image-based Detection

  • DeepReg [Shi et al, NNLS’ 14]
  • Convolutional Pose Machine [Wei et al,

CVPR’ 16]

  • Hourglass Network [Newell et al, ECCV’

16]

  • ….
  • Pros

○ Accurate across poses/identity

  • Cons

○ Lack of temporal consistency (jittering)

Video-based Detection

slide-5
SLIDE 5

Landmark Detection Methods

Video-based Detection

  • Recurrent Encoder-Decoder Network

[Peng et al, ECCV’ 16]

  • Two-Streams Transformer [Liu et al,

TPAMI’ 17]

  • Supervision-by-Registration [Ours]
  • ….
  • Pros

○ Temporal-consistent

  • Cons

○ Require per-frame annotations, difficult to scale up

Image-based Detection

  • DeepReg [Shi et al, NNLS’ 14]
  • Convolutional Pose Machine [Wei et al,

CVPR’ 16]

  • Hourglass Network [Newell et al, ECCV’

16]

  • ….
  • Pros

○ Accurate across poses/identity

  • Cons

○ Lack of temporal consistency (jittering)

slide-6
SLIDE 6

What is Supervision-by-Registration?

slide-7
SLIDE 7

Lucas-Kanade Tracking Operation: Differentiable

slide-8
SLIDE 8

Registration Loss: Forward-Backward Scheme

Noh et al. Learning Deconvolution Network for Semantic Segmentation? ICCV, 2015.

slide-9
SLIDE 9

Soft-Argmax Differentiable Operation

Sample Heatmap Output

slide-10
SLIDE 10

Implementation

  • Used VGG16 as the backbone architecture
  • Used CPM as the base facial landmark detector (can be replaced by others.

E.g., stacked hourglass network)

  • Operate LK tracking on images/conv1 features
slide-11
SLIDE 11

Results: on Image Datasets

slide-12
SLIDE 12

Results: on Video Datasets

  • AUC@0.08 error for each individual video of 300-VW category C. The

numbers are percentages.

slide-13
SLIDE 13

Demo

slide-14
SLIDE 14
slide-15
SLIDE 15

Take Home Messages

  • Registration can be a free supervision signal to enforce temporal consistency
  • More generally, self-supervision is powerful!