VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild - - PowerPoint PPT Presentation

voxelpose towards multi camera 3d human pose estimation
SMART_READER_LITE
LIVE PREVIEW

VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild - - PowerPoint PPT Presentation

VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment Chunyu Wang Microsoft Research Asia https://github.com/microsoft/voxelpose-pytorch Bro road ad Im Impact pact Intelligent retail (Microsoft Connected Store)


slide-1
SLIDE 1

VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment

Chunyu Wang Microsoft Research Asia

https://github.com/microsoft/voxelpose-pytorch

slide-2
SLIDE 2

Bro road ad Im Impact pact

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

  • Intelligent retail (Microsoft Connected Store)
  • Sports broadcasting/training/judging
  • Human-robot interaction
  • Augmented/virtual reality
slide-3
SLIDE 3

Pre reviou vious s Wo Work rk

Image Credit it: Dong, Junting, et al. “Fast and robust multi-person 3d pose estimation from multiple views.”, CVPR 2019

Multiview Images 2D Keypoint Estimation Cross view Matching Triangulation

  • r Pictorial

Model Multi-person 3D Poses

slide-4
SLIDE 4

Voxe xelPose lPose

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

Multiview Images 2D Keypoint Estimation Cross view Matching Triangulation

  • r Pictorial

Model Multi-person 3D Poses Single Model (No hard decisions within steps) (Delay decision until all views are available)

slide-5
SLIDE 5

Voxe xelPose lPose

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

slide-6
SLIDE 6

Voxe xelPose lPose

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

1.

  • 1. Discre

scretize tize 3D Space ace by Voxels els

slide-7
SLIDE 7

Voxe xelPose lPose

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

  • 1. Discretize 3D Space by Voxels

2.

  • 2. Comp

mput ute e a featur eature e for ea each ch voxel el by inver ersel sely y projec ecting ting 2D featur eatures es to 3D

slide-8
SLIDE 8

Voxe xelPose lPose

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

  • 1. Discretize 3D Space by Voxels
  • 2. Compute a feature for each voxel by

inversely projecting 2D features to 3D 3.

  • 3. The

e resulti esulting ng feature eature is robust ust to

  • ccl

clusion usion

slide-9
SLIDE 9

Voxe xelPose lPose

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

  • 1. Discretize 3D Space by Voxels
  • 2. Compute a feature for each voxel by

inversely projecting 2D features to 3D

  • 3. The resulting feature is robust to
  • cclusion

4.

  • 4. Predict

edict wh wheth ether er ea each ch voxel xel conta ntains ns body y joints ts

slide-10
SLIDE 10

Hy Hybrid rid Model del- (1) 1) Hu Human an De Dete tection ction

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

(300mm x 300mm x 300mm)

The proposals need not to be very precise since we will refine them in the following step.

(2000mm x 2000mm x 2000mm)

slide-11
SLIDE 11

Hy Hybrid rid Model del- (2) 2) Jo Join int t De Detect tection ion

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

(30mm x 30mm x 30mm)

This is sufficiently accurate for body joint localization.

slide-12
SLIDE 12

Tech echnical nical De Deta tails ils of Vo f Voxe xelPose lPose

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

slide-13
SLIDE 13

St Step ep 1: 1: 2D 2D Hea Heatmap tmap Est stimation imation

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

It can use the existing methods such as OpenPose, HRNet and AlphaPose.

HE: Heatmap Estimation

slide-14
SLIDE 14

St Step ep 2: 2: 3D 3D Pe Pers rson

  • n De

Detect tection ion

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

HE: Heatmap Estimation 3PN: 3D Proposal Estimation

slide-15
SLIDE 15

St Step ep 2: 2: 3D 3D Pe Pers rson

  • n De

Detect tection ion

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

3D feature volume (whole space)

Proposals X x Y x Z x K

number of voxels

X x Y x Z x 1 3PN Network

A scalar for each voxel: The likelihood of having a people centered at the voxel We keep K largest voxels (proposals) after NMS

slide-16
SLIDE 16

St Step ep 3: 3: 3D 3D Jo Join int t De Dete tection ction

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

HE: Heatmap Estimation 3PN: 3D Proposal Estimation

slide-17
SLIDE 17

St Step ep 3: 3: 3D 3D Jo Join int t De Dete tection ction

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

3D feature volume (proposal) 3D Pose Heatmap X x Y x Z x K

number of voxels

X x Y x Z x K PEN Network

Per-voxel likelihood for joints in the 3D space

Compute Expectation

slide-18
SLIDE 18

Jo Join int t Tra raining ining

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

HE: Heatmap Estimation 3PN: 3D Proposal Estimation

slide-19
SLIDE 19

Prop roposal sal Qua uality lity

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

We project the 3D proposals to 2D for visualization. Colored boxes represent their estimated confidence is larger than 0.1.

slide-20
SLIDE 20

Prop roposal sal Rec ecall all Rat ate

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

When the threshold is 140mm, we get about 95% recall when voxel size is 300mm This is sufficient for 3D pose estimation Using a smaller voxel improves the precision

slide-21
SLIDE 21

Im Impact pact of Cam f Camera era Nu Number ber

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

Camera Number AP25 ↑ AP50 ↑ AP100 ↑ AP150 ↑ MPJPE ↓

5 83.59 98.33 99.76 99.91 17.68mm 3 58.94 93.88 98.45 99.32 24.29mm 1 0.860 23.47 80.69 93.32 66.95mm 5* 50.91 95.25 99.36 99.56 25.51mm

* means training/testing on different cameras.

The error increases mildly when we decrease the number from 5 to 3. The error increases notably when using only one camera. It generalizes to different camera configurations.

slide-22
SLIDE 22

De Demo mo

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

slide-23
SLIDE 23

De Demo mo

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng

slide-24
SLIDE 24

De Demo mo

VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng