supervision by registration an
play

Supervision-by-Registration: An Unsupervised Approach to Improve the - PowerPoint PPT Presentation

Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors Xuanyi Dong 1 , Shoou-I Yu 2 , Xinshuo Weng 2 , Shih-En Wei 2 , Yi Yang 1 , Yaser Sheikh 2 1 Cai University of Technology Sydney, 2 Oculus


  1. Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors Xuanyi Dong 1 , Shoou-I Yu 2 , Xinshuo Weng 2 , Shih-En Wei 2 , Yi Yang 1 , Yaser Sheikh 2 1 Cai University of Technology Sydney, 2 Oculus Research, Facebook CVPR 2018, Salt Lake City

  2. Facial Landmark Detection

  3. A Challenging Problem Temporal Poses Identity Consistency (expressions/viewpoints) Sagonas et al. 300 Faces in-the-Wild Challenge: The first facial landmark localization Challenge. ICCV , 2013.

  4. Landmark Detection Methods Image-based Detection Video-based Detection ● DeepReg [Shi et al, NNLS’ 14] ● Convolutional Pose Machine [Wei et al, CVPR’ 16] ● Hourglass Network [Newell et al, ECCV’ 16] ● …. ● Pros ○ Accurate across poses/identity ● Cons ○ Lack of temporal consistency (jittering)

  5. Landmark Detection Methods Image-based Detection Video-based Detection ● DeepReg [Shi et al, NNLS’ 14] ● Recurrent Encoder-Decoder Network ● [Peng et al, ECCV’ 16] Convolutional Pose Machine [Wei et al, CVPR’ 16] ● Two-Streams Transformer [Liu et al, ● Hourglass Network [Newell et al, ECCV’ TPAMI’ 17] ● 16] Supervision-by-Registration [Ours] ● …. ● …. ● ● Pros Pros ○ ○ Accurate across poses/identity Temporal-consistent ● ● Cons Cons ○ ○ Lack of temporal consistency Require per-frame annotations, (jittering) difficult to scale up

  6. What is Supervision-by-Registration?

  7. Lucas-Kanade Tracking Operation: Differentiable

  8. Registration Loss: Forward-Backward Scheme Noh et al. Learning Deconvolution Network for Semantic Segmentation? ICCV , 2015.

  9. Soft-Argmax Differentiable Operation Sample Heatmap Output

  10. Implementation ● Used VGG16 as the backbone architecture ● Used CPM as the base facial landmark detector (can be replaced by others. E.g., stacked hourglass network) ● Operate LK tracking on images/conv1 features

  11. Results: on Image Datasets

  12. Results: on Video Datasets ● AUC@0.08 error for each individual video of 300-VW category C. The numbers are percentages.

  13. Demo

  14. Take Home Messages ● Registration can be a free supervision signal to enforce temporal consistency ● More generally, self-supervision is powerful!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend