Learning Human Pose from Unaligned Data through Image Translation - - PowerPoint PPT Presentation

learning human pose from unaligned data through image
SMART_READER_LITE
LIVE PREVIEW

Learning Human Pose from Unaligned Data through Image Translation - - PowerPoint PPT Presentation

Learning Human Pose from Unaligned Data through Image Translation Tomas Jakab Ankush Gupta Andrea Vedaldi Hakan Bilen Presented by Triantafyllos Afouras Goal Learn human-body landmark detectors from unlabelled videos and unaligned


slide-1
SLIDE 1

Learning Human Pose from Unaligned Data through Image Translation

Presented by Triantafyllos Afouras

Tomas Jakab Ankush Gupta Hakan Bilen Andrea Vedaldi

slide-2
SLIDE 2

Goal

Learn human-body landmark detectors from unlabelled videos and unaligned annotations

… , …

Human images Unaligned poses

Pose estimate

slide-3
SLIDE 3

Model architecture

slide-4
SLIDE 4

Autoencoding

input image reconstruction encoder decoder code

slide-5
SLIDE 5

Autoencoding

input image reconstruction encoder decoder code not interpretable L

slide-6
SLIDE 6

Filtering geometric information

input image reconstruction encoder decoder 2D keypoints

slide-7
SLIDE 7

Filtering geometric information

input image reconstruction encoder decoder 2D keypoints no appearance information for image reconstruction L

slide-8
SLIDE 8

Conditional generation

input images reconstruction geometry encoder decoder 2D keypoints

appearance encoder

slide-9
SLIDE 9

Result: unsupervised 2D keypoints discovery

Unsupervised learning of object landmarks through conditional image generation. Jakab, Gupta, Bilen, Vedaldi. Proc. NeurIPS, 2018

slide-10
SLIDE 10

Unsupervised 2D keypoints

Unsupervised learning of object landmarks through conditional image generation. Jakab, Gupta, Bilen, Vedaldi. Proc. NeurIPS, 2018

discovered landmarks what we actually want

vs.

slide-11
SLIDE 11

Learning to label as image translation

input image reconstruction encoder decoder bottleneck

discriminator

looks like a skeleton?

slide-12
SLIDE 12

skeleton rgb (reconstruction)

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Zhu and Park et al., 2017.

rgb rgb skeleton

Image Translation = CycleGAN

slide-13
SLIDE 13

Cheating CycleGANs

input image reconstruction encoder decoder bottleneck

discriminator

looks like a skeleton?

needs to smuggle appearance to facilitate the reconstruction

slide-14
SLIDE 14

Cheating CycleGANs

source reconstruction bottleneck log-bottleneck

The model cheats and encodes appearance information together with geometry

smuggling the appearance

slide-15
SLIDE 15

Tightening the screw

input image reconstruction

encoder

decoder

skeleton image looks like a skeleton?

discriminator

keypoint detector analytical renderer

pre-trained offline

appearance encoder

style image handcrafted bottleneck

slide-16
SLIDE 16

Tightening the screw

input image reconstruction

encoder

decoder

skeleton image looks like a skeleton?

discriminator

keypoint detector analytical renderer

pre-trained offline

appearance encoder

style image handcrafted bottleneck

slide-17
SLIDE 17

Our model in detail

input image reconstruction

encoder decoder

skeleton image looks like a skeleton?

discriminator

keypoint detector

analytical renderer appearance encoder

style image clean skeleton images unpaired skeleton images detected keypoints

slide-18
SLIDE 18

Results

slide-19
SLIDE 19

Human pose estimation

Human3.6M Pennaction prediction prediction Simplified Human3.6M Dataset prediction

slide-20
SLIDE 20

Human pose estimation

Human3.6M Pennaction

slide-21
SLIDE 21

Unsupervised to labeled keypoints

discovered landmarks what we actually want

supervised linear regression

unsupervised methods

directly predicting labelled keypoints

  • ur method
slide-22
SLIDE 22

Human pose estimation

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0

hourglass (supervised) Thewlis et al. Zhang et al.

  • urs

Simplified Human3.6M %-MSE norm. by image size

16.5 17.0 17.5 18.0 18.5 19.0 19.5 20.0

hourglass (supervised)

  • urs

(supervised)

  • urs

Human3.6M

MSE in pixels

unsupervised discovery + supervised regression no paired data

slide-23
SLIDE 23

Ablations

2.5 3.0 3.5 4.0 4.5 CycleGAN + apperance conditioning + clean (analytical) skeleton renderer bottleneck

  • 2nd cycle = ours

Simplified Human3.6M

%-MSE norm. by image size

slide-24
SLIDE 24

Disentangling style and geometry

slide-25
SLIDE 25

Disentangling style and geometry

Mixing appearance and geometry by conditioning on a different identity geometry style reconstruction

slide-26
SLIDE 26

Conclusion

Learn landmark detectors from unlabeled videos and unaligned pose annotations. Using no paired data / labelled images. Prevent appearance leakage in CycleGAN through: (a) novel bottleneck with a differentiable sketch renderer. (b) Conditioning the generator on an appearance image. Outperform state-of-the-art supervised and unsupervised landmark detectors for human pose. Method factorizes object appearance and geometry → transfer style / pose.

slide-27
SLIDE 27

Learning Human Pose from Unaligned Data through Image Translation

Presented by Triantafyllos Afouras

Tomas Jakab Ankush Gupta Hakan Bilen Andrea Vedaldi www.robots.ox.ac.uk/~vgg/ research/unsupervised_pose/