Deep Learning Wei Yang Applications Understand Activities Family - - PowerPoint PPT Presentation

β–Ά
deep learning
SMART_READER_LITE
LIVE PREVIEW

Deep Learning Wei Yang Applications Understand Activities Family - - PowerPoint PPT Presentation

Human Pose Estimation with Deep Learning Wei Yang Applications Understand Activities Family Robots 2 American Heist (2014) - The Bank Robbery Scene What do we need to know to recognize a crime scene? 3 stand stand Cues Scene: bank


slide-1
SLIDE 1

Human Pose Estimation with Deep Learning

Wei Yang

slide-2
SLIDE 2

2

American Heist (2014) - The Bank Robbery Scene

Applications

Understand Activities Family Robots

slide-3
SLIDE 3

3

What do we need to know to recognize a crime scene?

slide-4
SLIDE 4

4

Cues

Scene: bank Abnormal pose Activity: robbery stand stand Hands up Lay down Lay down Lay down

slide-5
SLIDE 5

5

Why is human pose estimation challenging?

slide-6
SLIDE 6

6

#1. Articulation #2. Occlusion #3. Scale variation

slide-7
SLIDE 7

7

#1. Articulation #2. Occlusion #3. Scale variation

slide-8
SLIDE 8

8

#1. Articulation #2. Occlusion #3. Scale variation

slide-9
SLIDE 9

9

Applications

Understand Activities Family Robots

slide-10
SLIDE 10

3D Human Poses

10

Real-Time Imitation of Human Whole-Body Motions by Humanoids.

  • J. Koenemann, F. Burget, and M. Bennewitz. ICRA, 2014.
slide-11
SLIDE 11

Deep Learning Based Methods

11

Fully Convolutional Network Regression with Euclidean Loss: 𝑀 = 1

2 Οƒπ‘ž=1 𝑄

ΰ·‘ πΌπ‘ž βˆ’ πΌπ‘ž 2

2

where ΰ·‘ πΌπ‘ž ∼ 𝑂 π‘šπ‘ž, Ξ£ , 𝑑. 𝑒. , π‘ž = 1, β‹― , 𝑄 𝑄 heatmaps πΌπ‘ž

slide-12
SLIDE 12

Outline

12

Gray Black

Feature pyramid learning

ICCV 2017 CVPR 2018

Scale 3D Pose

In-the-wild 3D pose estimation

slide-13
SLIDE 13

Outline

13

Gray Black

Feature pyramid learning

ICCV 2017 CVPR 2018

Scale 3D Pose

In-the-wild 3D pose estimation

slide-14
SLIDE 14

Why the Scale Matters?

14

Yipin Yang, Yao Yu, Yu Zhou, Sidan Du, James Davis, Ruigang Yang. Semantic Parametric Reshaping of Human Body

  • Models. In 3DV Workshop on Dynamic Shape Measurement and Analysis, 2014.
slide-15
SLIDE 15

Why the Scale Matters?

15 Learning Feature Pyramids for Human Pose Estimation Wei Yang , Shuang Li, Wanli Ouyang, Hongsheng Li, Xiaogang Wang ICCV, 2017

slide-16
SLIDE 16

Previous work

Multi-scal scale e testing ng The model itself is not scale invariant Multi-branch anch network

  • rk

Need much more memory and computation

16

Felzenszwalb, Pedro F., et al. "Object detection with discriminatively trained part-based models." TPAMI, 2010. Tompson, Jonathan, et al. "Efficient object localization using convolutional networks." CVPR. 2015.

slide-17
SLIDE 17

Hourglass

17 Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation[C]//European Conference on Computer Vision. Springer, Cham, 2016: 483-499.

slide-18
SLIDE 18

Pyramid Residual Modules

18

PRM + Pool PRM

Hourglass Hourglass Stack 1 Stack π‘œ

Conv

128 Γ— 128 64 Γ— 64

Detailed hourglass structure Convolution Pyramid Residual module Score maps Addition

(a) (b)

256x256

Ratio 1 Ratio π‘œ

𝐲(π‘š) 𝐲(π‘š+1)

𝑔 𝑔

1

𝑔

𝐷

𝑕

Identity Mapping

Newell et al. Stacked Hourglass Networks for Human Pose Estimation. ECCV, 2016

slide-19
SLIDE 19

Initialization of Multi-Branch Networks

Single le-br branch anch networks

VGG

Multi-branch anch network

  • rks

Inceptions

19

Traditional weight initialization methods, e.g., Gaussian, Xavier, MSRA (Kaiming), are not applicable for multi-branch networks.

Xavier Glorot, Yoshua Bengio ; Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:249-256, 2010.

slide-20
SLIDE 20

Initialization of Multi-Branch Networks

20

Conv / FC

𝐲1

(π‘š) 𝐲2 (π‘š)

𝐲𝑑𝑗

(π‘š)

𝐳(π‘š)

Conv / FC

𝐳1

(π‘š) 𝐳2 (π‘š)

𝐳𝑑𝑝

(π‘š)

𝐲(π‘š) Forward Backward

𝐳(π‘š) = 𝐗(π‘š) ෍

𝑑=1 𝐷𝑗

(π‘š)

𝐲𝑑

(π‘š) + 𝐜(π‘š)

𝐲(π‘š+1) = 𝑔(𝐳(π‘š)) 𝛽𝐷𝑗

π‘š π‘œπ‘— π‘š Var πœ• π‘š

= 1 Δ𝐲(π‘š) = ෍

𝑑=1 𝐷𝑝

(π‘š)

𝐗 π‘š π‘ˆ Δ𝐳(π‘š) Δ𝐳(π‘š) = 𝑔′(𝐳 π‘š )Δ𝐲(π‘š+1) 𝛽𝐷𝑝

π‘š π‘œπ‘ π‘š Var πœ• π‘š

= 1

* 𝛽 = 0.5 for ReLU and 1 for Tanh and Sigmoid.

slide-21
SLIDE 21

Initialization of Multi-Branch Networks

21

  • 0.1

0.1 0.3 0.5 0.7 0.9 1.1 1 2 3 4 5 6 7 8 9 10 11 OUTPUT STD LAYER INDEX MSR init Ours init

He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." ICCV. 2015.

slide-22
SLIDE 22

Qualitative Results

22

MPII dataset LSP dataset

slide-23
SLIDE 23

Evaluation Metric

23

PCK:

Percentage of Correct Keypoints

𝛽 β‹… max(β„Ž, π‘₯)

slide-24
SLIDE 24

Results on MPII Human Pose

24

State-of-the-art performance

http://human-pose.mpi-inf.mpg.de/#results

slide-25
SLIDE 25

Image Classification

25

Top-1 Test Error on CIFAR-10

slide-26
SLIDE 26

Semantic Segmentation:

PASCAL VOC 2012 dataset

(b) DeepLab (c) DeepLap+PRM (a) Image (b) DeepLab (c) DeepLap+PRM (a) Image (b) DeepLab (c) DeepLap+PRM (a) Image

slide-27
SLIDE 27

Section Summary

  • Feature pyramid module
  • Generalizable for various networks and tasks
  • Weight initialization for multi-branch networks

27

Learning Feature Pyramids for Human Pose Estimation

Wei Yang , Shuang Li, Wanli Ouyang, Hongsheng Li, Xiaogang Wang ICCV, 2017

slide-28
SLIDE 28

Outline

28

Gray Black

Feature pyramid learning

ICCV 2017 CVPR 2018

Scale 3D Pose

In-the-wild 3D pose estimation

slide-29
SLIDE 29

Challenges: No Annotation

Constrai ained ned scenes nes In In-the-wil ild scenes es

29

No annotation

Domain Discrepancy

slide-30
SLIDE 30

Which one is more plausible?

30

Discriminator

slide-31
SLIDE 31

Weakly Supervised Adversarial Learning

31

𝐸

3D dataset Images w/o GT

𝐻

3D Human Pose Estimator Multi-source Discriminator

Prediction Ground-truth Real Fake

slide-32
SLIDE 32

Adversarial Learning

32

Generator Discriminator Fool Tell 𝑴𝒑𝒕𝒕𝑯 𝑴𝒑𝒕𝒕𝑬 Euclidean Loss Classification Loss

slide-33
SLIDE 33

Stack 1 Stack π‘œ

Generator

33

Conv Residual Residual

Hourglass

128 Γ— 128 64 Γ— 64 256x256

2D score maps

Depth

3D Poses 2D module Depth module

…

slide-34
SLIDE 34

Discriminator

34

slide-35
SLIDE 35

CNN

2D Heatmaps Depthmaps

Image 𝐽

CNN CNN

Real Fake

64 64

Geometric descriptor

𝑄 𝑄 [Δ𝑦, Δ𝑧, Δ𝑨] [Δ𝑦2, Δ𝑧2, Δ𝑨2]

Fully Connected layers

256

Real or Fake samples

Concatenation

Multi-Source Discriminator

35

Raw poses

slide-36
SLIDE 36

Effectiveness of Adversarial Learning

36

slide-37
SLIDE 37

Ablation Study on H36M Dataset

37

64.9 65.2 64.8 61.3 60.3 59.7 58 60 62 64 66

MPJPE (error in mm) on H36M

Full Geo Pose Baseline Baseline (fix 2D) State-of-art*

8% less error

*Zhou et al. ICCV’17

Zhou et al. ICCV’17 (Ours) Fix 2D, finetune depth Jointly learn 2D + depth Image+Pose Image+Geo Image+Pose+Geo

slide-38
SLIDE 38

Results on Images in the Wild

38

baseline Ours

slide-39
SLIDE 39

Multi-view Results

39

slide-40
SLIDE 40

Section Summary

  • Weakly supervised adversarial learning for

3D pose estimation in the wild

  • Multi-source discriminator

40

3D Human Pose Estimation in the Wild by Adversarial Learning

Wei Yang , Wanli Ouyang, Xiaolong Wang, Hongsheng Li, Xiaogang Wang CVPR, 2018

slide-41
SLIDE 41

Code

  • Open-source PyTorch code
  • https://github.com/bearpaw/pytorch-pose
  • ICCV 17
  • https://github.com/bearpaw/PyraNet

41

slide-42
SLIDE 42

Thanks!

wyang@ee.cuhk.edu.hk http://www.ee.cuhk.edu.hk/~wyang/ @bearpaw

42