Human Pose Estimation with Deep Learning
Wei Yang
Deep Learning Wei Yang Applications Understand Activities Family - - PowerPoint PPT Presentation
Human Pose Estimation with Deep Learning Wei Yang Applications Understand Activities Family Robots 2 American Heist (2014) - The Bank Robbery Scene What do we need to know to recognize a crime scene? 3 stand stand Cues Scene: bank
Human Pose Estimation with Deep Learning
Wei Yang
2
American Heist (2014) - The Bank Robbery Scene
Understand Activities Family Robots
3
What do we need to know to recognize a crime scene?
4
Scene: bank Abnormal pose Activity: robbery stand stand Hands up Lay down Lay down Lay down
5
Why is human pose estimation challenging?
6
#1. Articulation #2. Occlusion #3. Scale variation
7
#1. Articulation #2. Occlusion #3. Scale variation
8
#1. Articulation #2. Occlusion #3. Scale variation
9
Understand Activities Family Robots
10
Real-Time Imitation of Human Whole-Body Motions by Humanoids.
11
Fully Convolutional Network Regression with Euclidean Loss: π = 1
2 Οπ=1 π
ΰ·‘ πΌπ β πΌπ 2
2
where ΰ·‘ πΌπ βΌ π ππ, Ξ£ , π‘. π’. , π = 1, β― , π π heatmaps πΌπ
12
Gray Black
Feature pyramid learning
ICCV 2017 CVPR 2018
Scale 3D Pose
In-the-wild 3D pose estimation
13
Gray Black
Feature pyramid learning
ICCV 2017 CVPR 2018
Scale 3D Pose
In-the-wild 3D pose estimation
14
Yipin Yang, Yao Yu, Yu Zhou, Sidan Du, James Davis, Ruigang Yang. Semantic Parametric Reshaping of Human Body
15 Learning Feature Pyramids for Human Pose Estimation Wei Yang , Shuang Li, Wanli Ouyang, Hongsheng Li, Xiaogang Wang ICCV, 2017
Multi-scal scale e testing ng The model itself is not scale invariant Multi-branch anch network
Need much more memory and computation
16
Felzenszwalb, Pedro F., et al. "Object detection with discriminatively trained part-based models." TPAMI, 2010. Tompson, Jonathan, et al. "Efficient object localization using convolutional networks." CVPR. 2015.
17 Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation[C]//European Conference on Computer Vision. Springer, Cham, 2016: 483-499.
18
PRM + Pool PRM
Hourglass Hourglass Stack 1 Stack π
Conv
128 Γ 128 64 Γ 64Detailed hourglass structure Convolution Pyramid Residual module Score maps Addition
(a) (b)
256x256Ratio 1 Ratio π
π²(π) π²(π+1)
π π
1
π
π·
π
Identity Mapping
Newell et al. Stacked Hourglass Networks for Human Pose Estimation. ECCV, 2016
Single le-br branch anch networks
VGG
Multi-branch anch network
Inceptions
19
Traditional weight initialization methods, e.g., Gaussian, Xavier, MSRA (Kaiming), are not applicable for multi-branch networks.
Xavier Glorot, Yoshua Bengio ; Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:249-256, 2010.
20
Conv / FC
π²1
(π) π²2 (π)
π²ππ
(π)
π³(π)
Conv / FC
π³1
(π) π³2 (π)
π³ππ
(π)
π²(π) Forward Backward
π³(π) = π(π) ΰ·
π=1 π·π
(π)
π²π
(π) + π(π)
π²(π+1) = π(π³(π)) π½π·π
π ππ π Var π π
= 1 Ξπ²(π) = ΰ·
π=1 π·π
(π)
π π π Ξπ³(π) Ξπ³(π) = πβ²(π³ π )Ξπ²(π+1) π½π·π
π ππ π Var π π
= 1
* π½ = 0.5 for ReLU and 1 for Tanh and Sigmoid.
21
0.1 0.3 0.5 0.7 0.9 1.1 1 2 3 4 5 6 7 8 9 10 11 OUTPUT STD LAYER INDEX MSR init Ours init
He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." ICCV. 2015.
22
MPII dataset LSP dataset
23
PCK:
Percentage of Correct Keypoints
π½ β max(β, π₯)
24
State-of-the-art performance
http://human-pose.mpi-inf.mpg.de/#results
25
Top-1 Test Error on CIFAR-10
PASCAL VOC 2012 dataset
(b) DeepLab (c) DeepLap+PRM (a) Image (b) DeepLab (c) DeepLap+PRM (a) Image (b) DeepLab (c) DeepLap+PRM (a) Image
27
Learning Feature Pyramids for Human Pose Estimation
Wei Yang , Shuang Li, Wanli Ouyang, Hongsheng Li, Xiaogang Wang ICCV, 2017
28
Gray Black
Feature pyramid learning
ICCV 2017 CVPR 2018
Scale 3D Pose
In-the-wild 3D pose estimation
Constrai ained ned scenes nes In In-the-wil ild scenes es
29
No annotation
Domain Discrepancy
30
Discriminator
Weakly Supervised Adversarial Learning
31
πΈ
3D dataset Images w/o GT
π»
3D Human Pose Estimator Multi-source Discriminator
Prediction Ground-truth Real Fake
32
Generator Discriminator Fool Tell π΄ππππ― π΄ππππ¬ Euclidean Loss Classification Loss
Stack 1 Stack π
33
Conv Residual Residual
Hourglass
128 Γ 128 64 Γ 64 256x2562D score maps
Depth
3D Poses 2D module Depth module
β¦
34
CNN
2D Heatmaps Depthmaps
Image π½
CNN CNN
Real Fake
64 64
Geometric descriptor
π π [Ξπ¦, Ξπ§, Ξπ¨] [Ξπ¦2, Ξπ§2, Ξπ¨2]
Fully Connected layers
256
Real or Fake samples
Concatenation
35
Raw poses
Effectiveness of Adversarial Learning
36
37
64.9 65.2 64.8 61.3 60.3 59.7 58 60 62 64 66
MPJPE (error in mm) on H36M
Full Geo Pose Baseline Baseline (fix 2D) State-of-art*
8% less error
*Zhou et al. ICCVβ17
Zhou et al. ICCVβ17 (Ours) Fix 2D, finetune depth Jointly learn 2D + depth Image+Pose Image+Geo Image+Pose+Geo
38
baseline Ours
39
3D pose estimation in the wild
40
3D Human Pose Estimation in the Wild by Adversarial Learning
Wei Yang , Wanli Ouyang, Xiaolong Wang, Hongsheng Li, Xiaogang Wang CVPR, 2018
41
wyang@ee.cuhk.edu.hk http://www.ee.cuhk.edu.hk/~wyang/ @bearpaw
42