deep learning
play

Deep Learning Wei Yang Applications Understand Activities Family - PowerPoint PPT Presentation

Human Pose Estimation with Deep Learning Wei Yang Applications Understand Activities Family Robots 2 American Heist (2014) - The Bank Robbery Scene What do we need to know to recognize a crime scene? 3 stand stand Cues Scene: bank


  1. Human Pose Estimation with Deep Learning Wei Yang

  2. Applications Understand Activities Family Robots 2 American Heist (2014) - The Bank Robbery Scene

  3. What do we need to know to recognize a crime scene? 3

  4. stand stand Cues Scene: bank Abnormal pose Lay down Activity: robbery Hands up Lay down Lay down 4

  5. Why is human pose estimation challenging? 5

  6. #1. Articulation #2. Occlusion #3. Scale variation 6

  7. #1. Articulation #2. Occlusion #3. Scale variation 7

  8. #1. Articulation #2. Occlusion #3. Scale variation 8

  9. Applications Understand Activities Family Robots 9

  10. 3D Human Poses Real-Time Imitation of Human Whole-Body Motions by Humanoids. J. Koenemann, F. Burget, and M. Bennewitz. ICRA, 2014. 10

  11. Deep Learning Based Methods Fully Convolutional Network 𝑄 heatmaps 𝐼 π‘ž 2 Regression with Euclidean Loss: 𝑀 = 1 𝑄 ΰ·‘ 2 Οƒ π‘ž=1 𝐼 π‘ž βˆ’ 𝐼 π‘ž 2 where ΰ·‘ 𝐼 π‘ž ∼ 𝑂 π‘š π‘ž , Ξ£ , 𝑑. 𝑒. , π‘ž = 1, β‹― , 𝑄 11

  12. Outline Scale 3D Pose Gray Black Feature pyramid In-the-wild 3D learning pose estimation ICCV 2017 CVPR 2018 12

  13. Outline Scale 3D Pose Gray Black Feature pyramid In-the-wild 3D learning pose estimation ICCV 2017 CVPR 2018 13

  14. Why the Scale Matters? 14 Yipin Yang, Yao Yu, Yu Zhou, Sidan Du, James Davis, Ruigang Yang. Semantic Parametric Reshaping of Human Body Models. In 3DV Workshop on Dynamic Shape Measurement and Analysis, 2014.

  15. Why the Scale Matters? Learning Feature Pyramids for Human Pose Estimation Wei Yang , Shuang Li, Wanli Ouyang, Hongsheng Li, Xiaogang Wang ICCV, 2017 15

  16. Previous work Multi-scal scale e testing ng Multi-branch anch network ork The model itself is not Need much more memory scale invariant and computation Felzenszwalb, Pedro F., et al. "Object detection with Tompson, Jonathan, et al. "Efficient object localization using discriminatively trained part-based models." TPAMI, 2010 . convolutional networks." CVPR . 2015. 16

  17. Hourglass Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation[C]//European 17 Conference on Computer Vision. Springer, Cham, 2016: 483-499.

  18. Pyramid Residual Modules 256x256 128 Γ— 128 (a) 𝐲 (π‘š) Stack 1 Stack π‘œ 64 Γ— 64 PRM + Pool Conv PRM Hourglass Hourglass Identity Mapping Ratio 1 Ratio π‘œ (b) 𝑔 𝑔 1 𝐷 𝑔 0 𝑕 Detailed hourglass structure 𝐲 (π‘š+1) Convolution Pyramid Residual module Score maps Addition 18 Newell et al. Stacked Hourglass Networks for Human Pose Estimation. ECCV, 2016

  19. Initialization of Multi-Branch Networks Single le-br branch anch networks Multi-branch anch network orks VGG Inceptions Traditional weight initialization methods, e.g., Gaussian, Xavier, MSRA (Kaiming), are not applicable for multi-branch networks . Xavier Glorot, Yoshua Bengio ; Proceedings of the Thirteenth International Conference on Artificial 19 Intelligence and Statistics, PMLR 9:249-256, 2010.

  20. Initialization of Multi-Branch Networks (π‘š) 𝐲 2 𝐲 (π‘š) (π‘š) (π‘š) 𝐲 1 𝐲 𝑑 𝑗 Backward Forward Conv / FC Conv / FC (π‘š) 𝐳 2 (π‘š) (π‘š) 𝐳 (π‘š) 𝐳 1 𝐳 𝑑 𝑝 (π‘š) (π‘š) 𝐷 𝑗 𝐷 𝑝 (π‘š) + 𝐜 (π‘š) Δ𝐲 (π‘š) = ෍ 𝐗 π‘š π‘ˆ Δ𝐳 (π‘š) 𝐳 (π‘š) = 𝐗 (π‘š) ෍ 𝐲 𝑑 𝑑=1 𝑑=1 𝐲 (π‘š+1) = 𝑔(𝐳 (π‘š) ) Δ𝐳 (π‘š) = 𝑔 β€² (𝐳 π‘š )Δ𝐲 (π‘š+1) π‘š π‘œ 𝑗 π‘š Var πœ• π‘š π‘š π‘œ 𝑝 π‘š Var πœ• π‘š 𝛽𝐷 𝑗 = 1 𝛽𝐷 𝑝 = 1 * 𝛽 = 0.5 for ReLU and 1 for Tanh and Sigmoid. 20

  21. Initialization of Multi-Branch Networks MSR init Ours init 1.1 0.9 0.7 OUTPUT STD 0.5 0.3 0.1 1 2 3 4 5 6 7 8 9 10 11 -0.1 LAYER INDEX 21 He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." ICCV. 2015.

  22. Qualitative Results MPII dataset LSP dataset 22

  23. Evaluation Metric PCK : Percentage of Correct Keypoints 𝛽 β‹… max(β„Ž, π‘₯) 23

  24. Results on MPII Human Pose State-of-the-art performance 24 http://human-pose.mpi-inf.mpg.de/#results

  25. Image Classification Top-1 Test Error on CIFAR-10 25

  26. Semantic Segmentation: PASCAL VOC 2012 dataset (a) Image (b) DeepLab (c) DeepLap+PRM (a) Image (b) DeepLab (c) DeepLap+PRM (a) Image (b) DeepLab (c) DeepLap+PRM

  27. Section Summary β€’ Feature pyramid module β€’ Generalizable for various networks and tasks β€’ Weight initialization for multi-branch networks Learning Feature Pyramids for Human Pose Estimation Wei Yang , Shuang Li, Wanli Ouyang, Hongsheng Li, Xiaogang Wang ICCV, 2017 27

  28. Outline Scale 3D Pose Gray Black Feature pyramid In-the-wild 3D learning pose estimation ICCV 2017 CVPR 2018 28

  29. Challenges: No Annotation Constrai ained ned scenes nes In In-the-wil ild scenes es Domain No annotation Discrepancy 29

  30. Which one is more plausible? Discriminator 30

  31. Weakly Supervised Adversarial Learning Images w/o GT 3D dataset Real Fake 𝐻 𝐸 3D Human Pose Estimator Multi-source Discriminator Prediction Ground-truth 31

  32. Adversarial Learning Fool Generator Discriminator 𝑴𝒑𝒕𝒕 𝑯 𝑴𝒑𝒕𝒕 𝑬 Tell Euclidean Loss Classification Loss 32

  33. Generator Depth module 2D module 256x256 128 Γ— 128 Stack 1 Stack π‘œ 64 Γ— 64 Residual Residual Depth Conv Hourglass … 2D score maps 3D Poses 33

  34. Discriminator 34

  35. Multi-Source Discriminator Real or Fake samples CNN Image 𝐽 256 Fully Connected layers Real Geometric CNN 𝑄 descriptor 𝑄 [Δ𝑦 2 , Δ𝑧 2 , Δ𝑨 2 ] [Δ𝑦, Δ𝑧, Δ𝑨] Fake CNN Raw poses 64 64 Concatenation 2D Heatmaps Depthmaps 35

  36. Effectiveness of Adversarial Learning 36

  37. Ablation Study on H36M Dataset MPJPE (error in mm) on H36M 8 % less error Image+Pose+Geo 59.7 (Ours) Image+Geo 60.3 Image+Pose 61.3 64.8 Jointly learn 2D + depth 65.2 Fix 2D, finetune depth 64.9 Zhou et al. ICCV’17 58 60 62 64 66 Full Geo Pose Baseline Baseline (fix 2D) State-of-art* 37 *Zhou et al. ICCV’17

  38. Results on Images in the Wild baseline Ours 38

  39. Multi-view Results 39

  40. Section Summary β€’ Weakly supervised adversarial learning for 3D pose estimation in the wild β€’ Multi-source discriminator 3D Human Pose Estimation in the Wild by Adversarial Learning Wei Yang , Wanli Ouyang, Xiaolong Wang, Hongsheng Li, Xiaogang Wang CVPR, 2018 40

  41. Code β€’ Open-source PyTorch code β€’ https://github.com/bearpaw/pytorch-pose β€’ ICCV 17 β€’ https://github.com/bearpaw/PyraNet 41

  42. Thanks! wyang@ee.cuhk.edu.hk http://www.ee.cuhk.edu.hk/~wyang/ @bearpaw 42

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend