Learning Human Pose from Unaligned Data through Image Translation - - PowerPoint PPT Presentation

▶

Sep 03, 2022 116 likes •409 views

Learning Human Pose from Unaligned Data through Image Translation Tomas Jakab Ankush Gupta Andrea Vedaldi Hakan Bilen Presented by Triantafyllos Afouras Goal Learn human-body landmark detectors from unlabelled videos and unaligned

SLIDE 1

Learning Human Pose from Unaligned Data through Image Translation

Presented by Triantafyllos Afouras

Tomas Jakab Ankush Gupta Hakan Bilen Andrea Vedaldi

SLIDE 2

Goal

Learn human-body landmark detectors from unlabelled videos and unaligned annotations

… , …

Human images Unaligned poses

Pose estimate

SLIDE 3

Model architecture

SLIDE 4

Autoencoding

input image reconstruction encoder decoder code

SLIDE 5

Autoencoding

input image reconstruction encoder decoder code not interpretable L

SLIDE 6

Filtering geometric information

input image reconstruction encoder decoder 2D keypoints

SLIDE 7

Filtering geometric information

input image reconstruction encoder decoder 2D keypoints no appearance information for image reconstruction L

SLIDE 8

Conditional generation

input images reconstruction geometry encoder decoder 2D keypoints

appearance encoder

SLIDE 9

Result: unsupervised 2D keypoints discovery

Unsupervised learning of object landmarks through conditional image generation. Jakab, Gupta, Bilen, Vedaldi. Proc. NeurIPS, 2018

SLIDE 10

Unsupervised 2D keypoints

Unsupervised learning of object landmarks through conditional image generation. Jakab, Gupta, Bilen, Vedaldi. Proc. NeurIPS, 2018

discovered landmarks what we actually want

vs.

SLIDE 11

Learning to label as image translation

input image reconstruction encoder decoder bottleneck

discriminator

looks like a skeleton?

SLIDE 12

skeleton rgb (reconstruction)

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Zhu and Park et al., 2017.

rgb rgb skeleton

Image Translation = CycleGAN

SLIDE 13

Cheating CycleGANs

input image reconstruction encoder decoder bottleneck

discriminator

looks like a skeleton?

needs to smuggle appearance to facilitate the reconstruction

SLIDE 14

Cheating CycleGANs

source reconstruction bottleneck log-bottleneck

The model cheats and encodes appearance information together with geometry

smuggling the appearance

SLIDE 15

Tightening the screw

input image reconstruction

encoder

decoder

skeleton image looks like a skeleton?

discriminator

keypoint detector analytical renderer

pre-trained offline

appearance encoder

style image handcrafted bottleneck

SLIDE 16

Tightening the screw

input image reconstruction

encoder

decoder

skeleton image looks like a skeleton?

discriminator

keypoint detector analytical renderer

pre-trained offline

appearance encoder

style image handcrafted bottleneck

SLIDE 17

Our model in detail

input image reconstruction

encoder decoder

skeleton image looks like a skeleton?

discriminator

keypoint detector

analytical renderer appearance encoder

style image clean skeleton images unpaired skeleton images detected keypoints

SLIDE 18

Results

SLIDE 19

Human pose estimation

Human3.6M Pennaction prediction prediction Simplified Human3.6M Dataset prediction

SLIDE 20

Human pose estimation

Human3.6M Pennaction

SLIDE 21

Unsupervised to labeled keypoints

discovered landmarks what we actually want

supervised linear regression

unsupervised methods

directly predicting labelled keypoints

ur method

SLIDE 22

Human pose estimation

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0

hourglass (supervised) Thewlis et al. Zhang et al.

Simplified Human3.6M %-MSE norm. by image size

16.5 17.0 17.5 18.0 18.5 19.0 19.5 20.0

hourglass (supervised)

(supervised)

Human3.6M

MSE in pixels

unsupervised discovery + supervised regression no paired data

SLIDE 23

Ablations

2.5 3.0 3.5 4.0 4.5 CycleGAN + apperance conditioning + clean (analytical) skeleton renderer bottleneck

2nd cycle = ours

Simplified Human3.6M

%-MSE norm. by image size

SLIDE 24

Disentangling style and geometry

SLIDE 25

Disentangling style and geometry

Mixing appearance and geometry by conditioning on a different identity geometry style reconstruction

SLIDE 26

Conclusion

Learn landmark detectors from unlabeled videos and unaligned pose annotations. Using no paired data / labelled images. Prevent appearance leakage in CycleGAN through: (a) novel bottleneck with a differentiable sketch renderer. (b) Conditioning the generator on an appearance image. Outperform state-of-the-art supervised and unsupervised landmark detectors for human pose. Method factorizes object appearance and geometry → transfer style / pose.

SLIDE 27

Learning Human Pose from Unaligned Data through Image Translation

Presented by Triantafyllos Afouras

Tomas Jakab Ankush Gupta Hakan Bilen Andrea Vedaldi www.robots.ox.ac.uk/~vgg/ research/unsupervised_pose/