VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild - - PowerPoint PPT Presentation
VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild - - PowerPoint PPT Presentation
VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment Chunyu Wang Microsoft Research Asia https://github.com/microsoft/voxelpose-pytorch Bro road ad Im Impact pact Intelligent retail (Microsoft Connected Store)
Bro road ad Im Impact pact
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
- Intelligent retail (Microsoft Connected Store)
- Sports broadcasting/training/judging
- Human-robot interaction
- Augmented/virtual reality
Pre reviou vious s Wo Work rk
Image Credit it: Dong, Junting, et al. “Fast and robust multi-person 3d pose estimation from multiple views.”, CVPR 2019
Multiview Images 2D Keypoint Estimation Cross view Matching Triangulation
- r Pictorial
Model Multi-person 3D Poses
Voxe xelPose lPose
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
Multiview Images 2D Keypoint Estimation Cross view Matching Triangulation
- r Pictorial
Model Multi-person 3D Poses Single Model (No hard decisions within steps) (Delay decision until all views are available)
Voxe xelPose lPose
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
Voxe xelPose lPose
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
1.
- 1. Discre
scretize tize 3D Space ace by Voxels els
Voxe xelPose lPose
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
- 1. Discretize 3D Space by Voxels
2.
- 2. Comp
mput ute e a featur eature e for ea each ch voxel el by inver ersel sely y projec ecting ting 2D featur eatures es to 3D
Voxe xelPose lPose
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
- 1. Discretize 3D Space by Voxels
- 2. Compute a feature for each voxel by
inversely projecting 2D features to 3D 3.
- 3. The
e resulti esulting ng feature eature is robust ust to
- ccl
clusion usion
Voxe xelPose lPose
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
- 1. Discretize 3D Space by Voxels
- 2. Compute a feature for each voxel by
inversely projecting 2D features to 3D
- 3. The resulting feature is robust to
- cclusion
4.
- 4. Predict
edict wh wheth ether er ea each ch voxel xel conta ntains ns body y joints ts
Hy Hybrid rid Model del- (1) 1) Hu Human an De Dete tection ction
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
(300mm x 300mm x 300mm)
The proposals need not to be very precise since we will refine them in the following step.
(2000mm x 2000mm x 2000mm)
Hy Hybrid rid Model del- (2) 2) Jo Join int t De Detect tection ion
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
(30mm x 30mm x 30mm)
This is sufficiently accurate for body joint localization.
Tech echnical nical De Deta tails ils of Vo f Voxe xelPose lPose
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
St Step ep 1: 1: 2D 2D Hea Heatmap tmap Est stimation imation
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
It can use the existing methods such as OpenPose, HRNet and AlphaPose.
HE: Heatmap Estimation
St Step ep 2: 2: 3D 3D Pe Pers rson
- n De
Detect tection ion
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
HE: Heatmap Estimation 3PN: 3D Proposal Estimation
St Step ep 2: 2: 3D 3D Pe Pers rson
- n De
Detect tection ion
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
3D feature volume (whole space)
Proposals X x Y x Z x K
number of voxels
X x Y x Z x 1 3PN Network
A scalar for each voxel: The likelihood of having a people centered at the voxel We keep K largest voxels (proposals) after NMS
St Step ep 3: 3: 3D 3D Jo Join int t De Dete tection ction
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
HE: Heatmap Estimation 3PN: 3D Proposal Estimation
St Step ep 3: 3: 3D 3D Jo Join int t De Dete tection ction
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
3D feature volume (proposal) 3D Pose Heatmap X x Y x Z x K
number of voxels
X x Y x Z x K PEN Network
Per-voxel likelihood for joints in the 3D space
Compute Expectation
Jo Join int t Tra raining ining
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
HE: Heatmap Estimation 3PN: 3D Proposal Estimation
Prop roposal sal Qua uality lity
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
We project the 3D proposals to 2D for visualization. Colored boxes represent their estimated confidence is larger than 0.1.
Prop roposal sal Rec ecall all Rat ate
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
When the threshold is 140mm, we get about 95% recall when voxel size is 300mm This is sufficient for 3D pose estimation Using a smaller voxel improves the precision
Im Impact pact of Cam f Camera era Nu Number ber
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
Camera Number AP25 ↑ AP50 ↑ AP100 ↑ AP150 ↑ MPJPE ↓
5 83.59 98.33 99.76 99.91 17.68mm 3 58.94 93.88 98.45 99.32 24.29mm 1 0.860 23.47 80.69 93.32 66.95mm 5* 50.91 95.25 99.36 99.56 25.51mm
* means training/testing on different cameras.
The error increases mildly when we decrease the number from 5 to 3. The error increases notably when using only one camera. It generalizes to different camera configurations.
De Demo mo
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
De Demo mo
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng
De Demo mo
VoxelPose elPose: : Hany nyue Tu, Chuny unyu u Wang, ang, Wenjun jun Zeng