6th International Workshop on Recovering 6D Object Pose
Yann Labbé 1,2, Justin Carpentier 1,2, Mathieu Aubry 4, Josef Sivic 1,2,3
CosyPose: Consistent multi-view multi-object 6D pose estimation
1 Inria 2 DI ENS, PSL 3 CIIRC, CTU in Prague 4 ENPC
CosyPose: Consistent multi-view multi-object 6D pose estimation - - PowerPoint PPT Presentation
6th International Workshop on Recovering 6D Object Pose CosyPose: Consistent multi-view multi-object 6D pose estimation arXiv:2008.08465 Yann Labb 1,2 , Justin Carpentier 1,2 , Mathieu Aubry 4 , Josef Sivic 1,2,3 1 Inria 2 DI ENS, PSL 3 CIIRC,
1 Inria 2 DI ENS, PSL 3 CIIRC, CTU in Prague 4 ENPC
...
Robust multi-view multi-object reconstruction Single-view 6D pose estimation
... ... BOP 20
Input RGB image
Mask-RCNN 2D detections
Coarse network Refiner network 6D pose estimation 2D detection 6D pose Coarse network 6D pose estimation Refiner network Coarse network 6D pose estimation Refiner network
(only 3 networks trained per dataset)
DeepIM, Li et al, ECCV 2018 + Network + Rotation parametrization + Loss + Data augmentation
CNN coarse
Input “canonical” pose Input “coarse” pose
CNN refiner
“Refined” pose
Pose update
(details in the paper arXiv:2008.08465)
37.
37.0
Pix2Pose, Park et al, ICCV 2019 T-LESS
29.5
29. 5 63. 7 37.
evsd < 0.3
Pix2Pose Ours w/o data augmentation Ours with data augmentation 20 40 60
29.5 37.0 63.8 Without data augmentation
(more ablations in the paper, Sec 3 Table 1b)
37.
37.0
Pix2Pose, Park et al, ICCV 2019 T-LESS
29.5
29. 5 63. 7 37.
evsd < 0.3
Pix2Pose Ours w/o data augmentation Ours with data augmentation 20 40 60
29.5 37.0 63.8 With data augmentation
+ Access to a GPU cluster* training 1 pose network: ~10 hours on 32 GPUs
*Jean-zay, French national cluster managed by GENCI-IDRIS
(more ablations in the paper, Sec 3 Table 1b)
Input image Predicted poses
3D visualization
BlenderProc: Denninger, Sundermeyer, Winkelbauer, Olefir, Hodan, Zidan, Elbadrawy, Knauer, Katam, Lodhi in RSS workshops.
Pix2Pose, Park et al, ICCV 2019
CDPN, Li et al, ICCV 2019 CosyPose, Labbé et al, ECCV 2020
https://github.com/kirumang/Pix2Pose
Synt (PBR [1]) Synt+Real [1] [5] [3] [4] [2] EPOS, Hodan et al, CVPR 2020 [6] + running time < 0.5s per image
ARcore (7 datasets)
1 Inria 2 DI ENS, PSL 3 CIIRC, CTU in Prague 4 ENPC