human pose estimation and action
play

Human Pose Estimation and Action Recognition Gang Yu, Megvii - PowerPoint PPT Presentation

ICIP 2019 Tutorial Human Pose Estimation and Action Recognition Gang Yu, Megvii (Face++) Junsong Yuan, SUNY Buffalo Zicheng Liu, Microsoft Overview Part1: Human Pose Part2: Action Recognitio n Estimation Datasets 2D Skeleton


  1. ICIP 2019 Tutorial Human Pose Estimation and Action Recognition Gang Yu, Megvii (Face++) Junsong Yuan, SUNY Buffalo Zicheng Liu, Microsoft

  2. Overview • Part1: Human Pose • Part2: Action Recognitio n Estimation – Datasets • 2D Skeleton • RGB • Top-Down • RGB-D • Bottom-Up • – Skeleton based 3D Skeleton • 2D -> 3D Skeleton approaches • 2D -> 3D Shape • 2D and 3D skeletons • Application – Video based approaches • 2D/3D CNN features

  3. Human Pose Estimation Algorithm and Application Gang Yu y u g a n g @ m e g v i i . c o m

  4. Outline • Introduction to Human Pose Estimation • 2D Skeleton • Top-Down • Bottom-Up • 3D Skeleton • 2D -> 3D Skeleton • 2D -> 3D Shape • Application • Conclusion

  5. Outline • Introduction to Human Pose Estimation • 2D Skeleton • Top-Down • Bottom-Up • 3D Skeleton • 2D -> 3D Skeleton • 2D -> 3D Shape • Application • Conclusion

  6. What is Human Pose Estimation?

  7. Benchmark and Evaluation • Benchmark • Single-person Estimation • MPII, FLIC, LSP, LIP • Multi-person Keypoint Detection • COCO, CrowdPose • Video • PoseTrack • 3D • Human3.6M, DensePose • Evaluation on COCO

  8. Outline • Introduction to Human Pose Estimation • 2D Skeleton • Top-Down • Bottom-Up • 3D Skeleton • 2D -> 3D Skeleton • 2D -> 3D Shape • Application • Conclusion

  9. 2D Skeleton: How to Do Pose Estimation • Top-down Approach VS Bottom-up Approach Top-down Head Human L-Arm Bottom-up • Top-down • Mask R-CNN, CPN, MSPN • High Performance (good localization ability), High Recall • Bottom-up • Openpose, Associative Embeding • Clean framework, potentially fast speed Mask R-CNN, Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, ICCV 2018 Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun, CVPR 2018 Rethinking on Multi-Stage Networks for Human Pose Estimation, Wenbo Li, Zhicheng Wang, Binyi Yin, Qixiang Peng, Yuming Du, Tianzi Xiao, Gang Yu, Hongtao Lu, Yichen Wei, Jian Sun OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, Yaser Sheikh, Associative Embedding: End-to-End Learning for Joint Detection and Grouping, Alejandro Newell, Zhiao Huang, Jia Deng, NIPS 2017

  10. Challenges • Ambiguous Appearance • Crowd Case • Large Pose • Inference Speed

  11. Top-Down: Mask R-CNN • Motivation: • Multi-task learning • ROI Pool -> ROI Align Mask R-CNN, Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, ICCV 2017

  12. Top-Down: Mask R-CNN • Experiments on COCO Skeleton: Mask R-CNN, Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, ICCV 2017

  13. Top-Down: Hourglass • Motivation: • Crop & Single Person Skeleton • Multi-stage context refinement Stacked Hourglass Networks for Human Pose Estimation, Alejandro Newell, Kaiyu Yang, and Jia Deng, ECCV 2016

  14. Top-Down: Hourglass • Structure of a one block Stacked Hourglass Networks for Human Pose Estimation, Alejandro Newell, Kaiyu Yang, and Jia Deng, ECCV 2016

  15. Top-Down: Hourglass • Experiments Stacked Hourglass Networks for Human Pose Estimation, Alejandro Newell, Kaiyu Yang, and Jia Deng, ECCV 2016

  16. Top-Down: Single Person Skeleton: CPM • Motivation: • Multi-stage context refinement • Large receptive Field -> long range spatial relationship Convolutional Pose Machines, Shih-En Wei, Varun Ramakrishna, Takeo Kanade, Yaser Sheikh, CVPR 2016

  17. Top-Down: Cascade Pyramid Network • Motivation: How to locate the “hard” joints • Human perspective Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun, CVPR 2018

  18. Top-Down: Cascade Pyramid Network • Motivation: How to locate the “hard” joints • Human perspective ✓ Nose ✓ Left elbow Visible easy keypoints ✓ Right hand ✕ What ? easy visible parts ✕ What?

  19. Top-Down: Cascade Pyramid Network • Motivation: How to locate the “hard” joints • Human perspective ✓ Nose ✓ Left elbow Visible easy keypoints context ✓ Right hand ✓ Left knee ✕ ✓ Visible hard enlarge view Right knee What ? keypoints ✓ Left hip easy visible parts ✕ hard to hard visible parts enlarge view What? distinguish?

  20. Top-Down: Cascade Pyramid Network • Motivation: How to locate the “hard” joints • Human perspective ✓ Nose ✓ Left elbow Visible easy keypoints context ✓ Right hand ✓ Left knee ✕ ✓ Visible hard enlarge view Right knee What ? keypoints context ✓ Left hip easy visible parts ✕ Right hard to hard visible parts enlarge view ✓ What? shoulder distinguish? Invisible part

  21. Top-Down: Cascade Pyramid Network • Motivation: How to locate the “hard” joints • Human perspective : Coarse to Fine coarse fine parts parts Input image receptive view getting larger Output image & more context

  22. Network Architecture Network Design Principles: ● Inspired by the process of human locating keypoints and adjusted to CNN network ○ locate easy parts => locate hard parts ● Two stages ○ GlobalNet: to locate the easy parts (Vanilla L2 loss) ○ RefineNet: to locate hard parts (deep layers) with online hard keypoint mining(Hard Mining Loss)

  23. Experiments: Person Detector 69.4 69.7 69.8 69.8 Keypoint mAP 68.8 36.3 41.1 44.3 49.3 52.1 Det mAP

  24. Experiments: Online Hard Keypoints Mining

  25. Experiments: Design Choices of GlobalNet & RefineNet

  26. Experiments

  27. Summary for CPN • Hard Keypoints with Coarse-to-fine Strategy (context) • Code: https://github.com/chenyilun95/tf-cpn • MS COCO2017 Challenge Winner

  28. Top-Down: A Simple Baseline • Motivation • Simple Baseline & OKS based tracking • Spatial Resolution Simple Baselines for Human Pose Estimation and Tracking, Bin Xiao, Haiping Wu, Yichen Wei, ECCV 2018

  29. Top-Down: A Simple Baseline • Experiments on COCO and PoseTrack Simple Baselines for Human Pose Estimation and Tracking, Bin Xiao, Haiping Wu, Yichen Wei, ECCV 2018

  30. Top-Down: HRNet • Motivation • High Resolution Feature maps Deep High-Resolution Representation Learning for Human Pose Estimation , Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang, CVPR2019

  31. Top-Down: HRNet Deep High-Resolution Representation Learning for Human Pose Estimation , Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang, CVPR2019

  32. Top-Down: HRNet • Experiments Deep High-Resolution Representation Learning for Human Pose Estimation , Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang, CVPR2019

  33. Top-Down: Multi-stage Pose Estimation • Motivation • Upperbound • Only Two-stages available (limited Context) Rethinking on Multi-Stage Networks for Human Pose Estimation, Wenbo Li, Zhicheng Wang, Binyi Yin, Qixiang Peng, Yuming Du, Tianzi Xiao, Gang Yu, Hongtao Lu, Yichen Wei, Jian Sun

  34. Top-Down: Multi-stage Pose Estimation • Method • Coarse-to-fine with better information flow • Involve more stages

  35. Top-Down: Multi-stage Pose Estimation • Cross Stage Feature Aggregation • Coarse-to-fine Supervision

  36. Experiments: More Stages

  37. Experiments: CTF & CSFA

  38. Experiments: COCO test-dev

  39. Experiments: COCO test-Challenge

  40. Summary for MSPN • Refined Coarse-to-fine Strategy • Code: https://github.com/megvii-detection/MSPN • MS COCO2018 Challenge Winner

  41. Bottom-Up: DeepCut • Motivation • Part Detector • Assemble (Integer Linear Optimization) DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation, Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele, CVPR 2016

  42. Bottom-Up: DeeperCut • Motivation • Deeper Part Detector + Assemble (image-conditioned pairwise terms + incremental optimization) DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model, Eldar Insafutdinov, Leonid Pishchulin, Bjoern Andres, Mykhaylo Andriluka, Bernt Schiele, ECCV2016

  43. Bottom-Up: OpenPose • Motivation • Part Detector (CPM) + Assemble (PAF) Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh, CVPR 2017

  44. Bottom-Up: OpenPose • Motivation • Part Detector (CPM) + Assemble (PAF) Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh, CVPR 2017

  45. Bottom-Up: OpenPose • Experiments on MPI and COCO Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh, CVPR 2017

  46. Bottom-Up: Associative Embedding • Motivation • Part Detector (Hourglass) + Assemble (AE) Associative Embedding: End-to-End Learning for Joint Detection and Grouping, Alejandro Newell, Zhiao Huang, Jia Deng, NIPS 2017

  47. Bottom-Up: Associative Embedding • Motivation • Part Detector (Hourglass) + Assemble (AE) Associative Embedding: End-to-End Learning for Joint Detection and Grouping, Alejandro Newell, Zhiao Huang, Jia Deng, NIPS 2017

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend