CosyPose: Consistent multi-view multi-object 6D pose estimation - - PowerPoint PPT Presentation

cosypose consistent multi view multi object 6d pose
SMART_READER_LITE
LIVE PREVIEW

CosyPose: Consistent multi-view multi-object 6D pose estimation - - PowerPoint PPT Presentation

6th International Workshop on Recovering 6D Object Pose CosyPose: Consistent multi-view multi-object 6D pose estimation arXiv:2008.08465 Yann Labb 1,2 , Justin Carpentier 1,2 , Mathieu Aubry 4 , Josef Sivic 1,2,3 1 Inria 2 DI ENS, PSL 3 CIIRC,


slide-1
SLIDE 1

6th International Workshop on Recovering 6D Object Pose

Yann Labbé 1,2, Justin Carpentier 1,2, Mathieu Aubry 4, Josef Sivic 1,2,3

CosyPose: Consistent multi-view multi-object 6D pose estimation

1 Inria 2 DI ENS, PSL 3 CIIRC, CTU in Prague 4 ENPC

arXiv:2008.08465

slide-2
SLIDE 2

Input images Output 3D scene

Multi-view 6D pose estimation

slide-3
SLIDE 3

CosyPose: Approach overview

...

Robust multi-view multi-object reconstruction Single-view 6D pose estimation

... ... BOP 20

Challenge

slide-4
SLIDE 4

Input RGB image

Mask-RCNN 2D detections

Single-view CosyPose

Coarse network Refiner network 6D pose estimation 2D detection 6D pose Coarse network 6D pose estimation Refiner network Coarse network 6D pose estimation Refiner network

(only 3 networks trained per dataset)

slide-5
SLIDE 5

DeepIM, Li et al, ECCV 2018 + Network + Rotation parametrization + Loss + Data augmentation

Pose estimation networks

CNN coarse

Input “canonical” pose Input “coarse” pose

CNN refiner

“Refined” pose

Pose update

(details in the paper arXiv:2008.08465)

slide-6
SLIDE 6

Key ingredients

37.

37.0

Pix2Pose, Park et al, ICCV 2019 T-LESS

29.5

29. 5 63. 7 37.

evsd < 0.3

Pix2Pose Ours w/o data augmentation Ours with data augmentation 20 40 60

29.5 37.0 63.8 Without data augmentation

(more ablations in the paper, Sec 3 Table 1b)

slide-7
SLIDE 7

Key ingredients

37.

37.0

Pix2Pose, Park et al, ICCV 2019 T-LESS

29.5

29. 5 63. 7 37.

evsd < 0.3

Pix2Pose Ours w/o data augmentation Ours with data augmentation 20 40 60

29.5 37.0 63.8 With data augmentation

+ Access to a GPU cluster* training 1 pose network: ~10 hours on 32 GPUs

*Jean-zay, French national cluster managed by GENCI-IDRIS

(more ablations in the paper, Sec 3 Table 1b)

slide-8
SLIDE 8

Input image Predicted poses

3D visualization

slide-9
SLIDE 9

BlenderProc: Denninger, Sundermeyer, Winkelbauer, Olefir, Hodan, Zidan, Elbadrawy, Knauer, Katam, Lodhi in RSS workshops.

BOP20 results

Pix2Pose, Park et al, ICCV 2019

RGB RGB-D

CDPN, Li et al, ICCV 2019 CosyPose, Labbé et al, ECCV 2020

https://github.com/kirumang/Pix2Pose

Synt (PBR [1]) Synt+Real [1] [5] [3] [4] [2] EPOS, Hodan et al, CVPR 2020 [6] + running time < 0.5s per image

ARcore (7 datasets)

slide-10
SLIDE 10

Code

https://github.com/ylabbe/cosypose

  • State-of-the-art pre-trained models for multiple datasets
  • RGB single-view and multi-view modular framework
  • Full training code
slide-11
SLIDE 11

https://github.com/ylabbe/cosypose

Yann Labbé 1,2, Justin Carpentier 1,2, Mathieu Aubry 4, Josef Sivic 1,2,3

1 Inria 2 DI ENS, PSL 3 CIIRC, CTU in Prague 4 ENPC

6th International Workshop on Recovering 6D Object Pose

CosyPose: Consistent multi-view multi-object 6D pose estimation

arXiv:2008.08465