Deep Learning Wei Yang Applications Understand Activities Family - PowerPoint PPT Presentation

Human Pose Estimation with Deep Learning Wei Yang

Applications Understand Activities Family Robots 2 American Heist (2014) - The Bank Robbery Scene

What do we need to know to recognize a crime scene? 3

stand stand Cues Scene: bank Abnormal pose Lay down Activity: robbery Hands up Lay down Lay down 4

Why is human pose estimation challenging? 5

#1. Articulation #2. Occlusion #3. Scale variation 6

Applications Understand Activities Family Robots 9

3D Human Poses Real-Time Imitation of Human Whole-Body Motions by Humanoids. J. Koenemann, F. Burget, and M. Bennewitz. ICRA, 2014. 10

Deep Learning Based Methods Fully Convolutional Network 𝑄 heatmaps 𝐼 𝑞 2 Regression with Euclidean Loss: 𝑀 = 1 𝑄 ෡ 2 σ 𝑞=1 𝐼 𝑞 − 𝐼 𝑞 2 where ෡ 𝐼 𝑞 ∼ 𝑂 𝑚 𝑞 , Σ , 𝑡. 𝑢. , 𝑞 = 1, ⋯ , 𝑄 11

Outline Scale 3D Pose Gray Black Feature pyramid In-the-wild 3D learning pose estimation ICCV 2017 CVPR 2018 12

Why the Scale Matters? 14 Yipin Yang, Yao Yu, Yu Zhou, Sidan Du, James Davis, Ruigang Yang. Semantic Parametric Reshaping of Human Body Models. In 3DV Workshop on Dynamic Shape Measurement and Analysis, 2014.

Why the Scale Matters? Learning Feature Pyramids for Human Pose Estimation Wei Yang , Shuang Li, Wanli Ouyang, Hongsheng Li, Xiaogang Wang ICCV, 2017 15

Previous work Multi-scal scale e testing ng Multi-branch anch network ork The model itself is not Need much more memory scale invariant and computation Felzenszwalb, Pedro F., et al. "Object detection with Tompson, Jonathan, et al. "Efficient object localization using discriminatively trained part-based models." TPAMI, 2010 . convolutional networks." CVPR . 2015. 16

Hourglass Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation[C]//European 17 Conference on Computer Vision. Springer, Cham, 2016: 483-499.

Pyramid Residual Modules 256x256 128 × 128 (a) 𝐲 (𝑚) Stack 1 Stack 𝑜 64 × 64 PRM + Pool Conv PRM Hourglass Hourglass Identity Mapping Ratio 1 Ratio 𝑜 (b) 𝑔 𝑔 1 𝐷 𝑔 0 𝑕 Detailed hourglass structure 𝐲 (𝑚+1) Convolution Pyramid Residual module Score maps Addition 18 Newell et al. Stacked Hourglass Networks for Human Pose Estimation. ECCV, 2016

Initialization of Multi-Branch Networks Single le-br branch anch networks Multi-branch anch network orks VGG Inceptions Traditional weight initialization methods, e.g., Gaussian, Xavier, MSRA (Kaiming), are not applicable for multi-branch networks . Xavier Glorot, Yoshua Bengio ; Proceedings of the Thirteenth International Conference on Artificial 19 Intelligence and Statistics, PMLR 9:249-256, 2010.

Initialization of Multi-Branch Networks (𝑚) 𝐲 2 𝐲 (𝑚) (𝑚) (𝑚) 𝐲 1 𝐲 𝑑 𝑗 Backward Forward Conv / FC Conv / FC (𝑚) 𝐳 2 (𝑚) (𝑚) 𝐳 (𝑚) 𝐳 1 𝐳 𝑑 𝑝 (𝑚) (𝑚) 𝐷 𝑗 𝐷 𝑝 (𝑚) + 𝐜 (𝑚) Δ𝐲 (𝑚) = ෍ 𝐗 𝑚 𝑈 Δ𝐳 (𝑚) 𝐳 (𝑚) = 𝐗 (𝑚) ෍ 𝐲 𝑑 𝑑=1 𝑑=1 𝐲 (𝑚+1) = 𝑔(𝐳 (𝑚) ) Δ𝐳 (𝑚) = 𝑔 ′ (𝐳 𝑚 )Δ𝐲 (𝑚+1) 𝑚 𝑜 𝑗 𝑚 Var 𝜕 𝑚 𝑚 𝑜 𝑝 𝑚 Var 𝜕 𝑚 𝛽𝐷 𝑗 = 1 𝛽𝐷 𝑝 = 1 * 𝛽 = 0.5 for ReLU and 1 for Tanh and Sigmoid. 20

Initialization of Multi-Branch Networks MSR init Ours init 1.1 0.9 0.7 OUTPUT STD 0.5 0.3 0.1 1 2 3 4 5 6 7 8 9 10 11 -0.1 LAYER INDEX 21 He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." ICCV. 2015.

Qualitative Results MPII dataset LSP dataset 22

Evaluation Metric PCK : Percentage of Correct Keypoints 𝛽 ⋅ max(ℎ, 𝑥) 23

Results on MPII Human Pose State-of-the-art performance 24 http://human-pose.mpi-inf.mpg.de/#results

Image Classification Top-1 Test Error on CIFAR-10 25

Semantic Segmentation: PASCAL VOC 2012 dataset (a) Image (b) DeepLab (c) DeepLap+PRM (a) Image (b) DeepLab (c) DeepLap+PRM (a) Image (b) DeepLab (c) DeepLap+PRM

Section Summary • Feature pyramid module • Generalizable for various networks and tasks • Weight initialization for multi-branch networks Learning Feature Pyramids for Human Pose Estimation Wei Yang , Shuang Li, Wanli Ouyang, Hongsheng Li, Xiaogang Wang ICCV, 2017 27

Challenges: No Annotation Constrai ained ned scenes nes In In-the-wil ild scenes es Domain No annotation Discrepancy 29

Which one is more plausible? Discriminator 30

Weakly Supervised Adversarial Learning Images w/o GT 3D dataset Real Fake 𝐻 𝐸 3D Human Pose Estimator Multi-source Discriminator Prediction Ground-truth 31

Adversarial Learning Fool Generator Discriminator 𝑴𝒑𝒕𝒕 𝑯 𝑴𝒑𝒕𝒕 𝑬 Tell Euclidean Loss Classification Loss 32

Generator Depth module 2D module 256x256 128 × 128 Stack 1 Stack 𝑜 64 × 64 Residual Residual Depth Conv Hourglass … 2D score maps 3D Poses 33

Discriminator 34

Multi-Source Discriminator Real or Fake samples CNN Image 𝐽 256 Fully Connected layers Real Geometric CNN 𝑄 descriptor 𝑄 [Δ𝑦 2 , Δ𝑧 2 , Δ𝑨 2 ] [Δ𝑦, Δ𝑧, Δ𝑨] Fake CNN Raw poses 64 64 Concatenation 2D Heatmaps Depthmaps 35

Effectiveness of Adversarial Learning 36

Ablation Study on H36M Dataset MPJPE (error in mm) on H36M 8 % less error Image+Pose+Geo 59.7 (Ours) Image+Geo 60.3 Image+Pose 61.3 64.8 Jointly learn 2D + depth 65.2 Fix 2D, finetune depth 64.9 Zhou et al. ICCV’17 58 60 62 64 66 Full Geo Pose Baseline Baseline (fix 2D) State-of-art* 37 *Zhou et al. ICCV’17

Results on Images in the Wild baseline Ours 38

Multi-view Results 39

Section Summary • Weakly supervised adversarial learning for 3D pose estimation in the wild • Multi-source discriminator 3D Human Pose Estimation in the Wild by Adversarial Learning Wei Yang , Wanli Ouyang, Xiaolong Wang, Hongsheng Li, Xiaogang Wang CVPR, 2018 40

Code • Open-source PyTorch code • https://github.com/bearpaw/pytorch-pose • ICCV 17 • https://github.com/bearpaw/PyraNet 41

Thanks! wyang@ee.cuhk.edu.hk http://www.ee.cuhk.edu.hk/~wyang/ @bearpaw 42

Deep Learning Wei Yang Applications Understand Activities Family - PowerPoint PPT Presentation

Human Pose Estimation with Deep Learning Wei Yang Applications Understand Activities Family Robots 2 American Heist (2014) - The Bank Robbery Scene What do we need to know to recognize a crime scene? 3 stand stand Cues Scene: bank

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

Language and Computers where to start? Language and Outline Language and Computers

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel A GULLO

Mohammed Ruthan Karthik Durvasula Yen-Hwei Lin Michigan State University The 6 th Annual Meeting

Finding Maximal Sets of Laminar 3-Separators in Planar Graphs in Linear Time David Eppstein

4-th International Symposium on Intelligent Distributed Computing Articulation and Sharing

VII. District EL Policy Review, Revision, and Development Rafael Escamilla , English Learner

High-Resolution Comprehensive 3-D Dynamic Database for Facial Articulation Analysis

SOUT H ASI AN AND I NDONE SI AN F L AGSHI P L ANGUAGE I NI T I AT I VE S T e c

Deep Learning Wei Yang Applications Understand Activities Family - PowerPoint PPT Presentation

Human Pose Estimation with Deep Learning Wei Yang Applications Understand Activities Family Robots 2 American Heist (2014) - The Bank Robbery Scene What do we need to know to recognize a crime scene? 3 stand stand Cues Scene: bank

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

Language and Computers where to start? Language and Outline Language and Computers

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel A GULLO

Mohammed Ruthan Karthik Durvasula Yen-Hwei Lin Michigan State University The 6 th Annual Meeting

Finding Maximal Sets of Laminar 3-Separators in Planar Graphs in Linear Time David Eppstein

4-th International Symposium on Intelligent Distributed Computing Articulation and Sharing

VII. District EL Policy Review, Revision, and Development Rafael Escamilla , English Learner

High-Resolution Comprehensive 3-D Dynamic Database for Facial Articulation Analysis

SOUT H ASI AN AND I NDONE SI AN F L AGSHI P L ANGUAGE I NI T I AT I VE S T e c

Deep learning for natural language processing A short primer on deep learning Benoit Favre <