Learning Human Pose from Unaligned Data through Image Translation
Presented by Triantafyllos Afouras
Tomas Jakab Ankush Gupta Hakan Bilen Andrea Vedaldi
Learning Human Pose from Unaligned Data through Image Translation - - PowerPoint PPT Presentation
Learning Human Pose from Unaligned Data through Image Translation Tomas Jakab Ankush Gupta Andrea Vedaldi Hakan Bilen Presented by Triantafyllos Afouras Goal Learn human-body landmark detectors from unlabelled videos and unaligned
Presented by Triantafyllos Afouras
Tomas Jakab Ankush Gupta Hakan Bilen Andrea Vedaldi
Human images Unaligned poses
Pose estimate
appearance encoder
Unsupervised learning of object landmarks through conditional image generation. Jakab, Gupta, Bilen, Vedaldi. Proc. NeurIPS, 2018
Unsupervised learning of object landmarks through conditional image generation. Jakab, Gupta, Bilen, Vedaldi. Proc. NeurIPS, 2018
discriminator
looks like a skeleton?
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Zhu and Park et al., 2017.
discriminator
looks like a skeleton?
The model cheats and encodes appearance information together with geometry
input image reconstruction
encoder
decoder
skeleton image looks like a skeleton?
discriminator
keypoint detector analytical renderer
pre-trained offline
appearance encoder
style image handcrafted bottleneck
input image reconstruction
encoder
decoder
skeleton image looks like a skeleton?
discriminator
keypoint detector analytical renderer
pre-trained offline
appearance encoder
style image handcrafted bottleneck
input image reconstruction
encoder decoder
skeleton image looks like a skeleton?
discriminator
keypoint detector
analytical renderer appearance encoder
style image clean skeleton images unpaired skeleton images detected keypoints
Human3.6M Pennaction prediction prediction Simplified Human3.6M Dataset prediction
Human3.6M Pennaction
discovered landmarks what we actually want
supervised linear regression
directly predicting labelled keypoints
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0
hourglass (supervised) Thewlis et al. Zhang et al.
Simplified Human3.6M %-MSE norm. by image size
16.5 17.0 17.5 18.0 18.5 19.0 19.5 20.0
hourglass (supervised)
(supervised)
Human3.6M
MSE in pixels
unsupervised discovery + supervised regression no paired data
2.5 3.0 3.5 4.0 4.5 CycleGAN + apperance conditioning + clean (analytical) skeleton renderer bottleneck
%-MSE norm. by image size
Mixing appearance and geometry by conditioning on a different identity geometry style reconstruction
Learn landmark detectors from unlabeled videos and unaligned pose annotations. Using no paired data / labelled images. Prevent appearance leakage in CycleGAN through: (a) novel bottleneck with a differentiable sketch renderer. (b) Conditioning the generator on an appearance image. Outperform state-of-the-art supervised and unsupervised landmark detectors for human pose. Method factorizes object appearance and geometry → transfer style / pose.
Presented by Triantafyllos Afouras
Tomas Jakab Ankush Gupta Hakan Bilen Andrea Vedaldi www.robots.ox.ac.uk/~vgg/ research/unsupervised_pose/