3D 3D Pos
- se
e Estimat ation
- n and
and Mod
- del
el Ret Retriev eval al in n the he Wild
Vincent Lepetit
ENPC ParisTech & TU Graz
3D 3D Pos ose e Estimat ation on and and Mod odel el Ret - - PowerPoint PPT Presentation
3D 3D Pos ose e Estimat ation on and and Mod odel el Ret Retriev eval al in n the he Wild Vincent Lepetit ENPC ParisTech & TU Graz H-O3 O3D : Ha Hand+ d+Obj Object Dataset Da 3D pos 3D ose, e, 3D 3D mod odel el retri
ENPC ParisTech & TU Graz
2 3D 3D pos
e, 3D 3D mod
el re retri rieval in the wild
H-O3 O3D: Ha Hand+ d+Obj Object Da Dataset
3 3D 3D pos
e, 3D 3D mod
el re retri rieval in the wild
H-O3 O3D: Ha Hand+ d+Obj Object Da Dataset
BB8 BB8: A A Scalable, Ac Accurate, Ro Robust to Partial Occlusion Method for Predicting the 3D Poses
halleng enging ng Object ects without hout Using ng Dep
4
Camera center
m1 m2 m3 m4
easier regression task;
rotation;
the translation. We can compute the 3D pose from these 2D locations.
5
6
3D 3D Pose
stim imat ation ion and and 3D 3D Mod
el Ret Retrieval rieval for
jects s in in the he Wild
Grabner, Peter M. Roth, and Vincent Lepetit. CVPR 2018.
7
2d bounding boxes
pose predictor 3D pose ?
8
network (length, width, height) of
bounding box PnP 3D pose of the
bounding box
height width length
9
2D reprojections
10
Locat Location ion Field Field Descrip escriptors:
ingle le Imag age e 3D 3D Mod
el Ret Retrieval rieval in in the he Wild ild. Alexander Grabner, Peter M. Roth, and Vincent Lepetit. 3DV 2019.
ShapeNet [Chang et al, 2015]
11 pose invariant descriptors
12
Descriptor CNN Descriptor CNN Descriptor CNN Descriptor CNN
13
14
15
16 16
18 3D 3D pos
e, 3D 3D mod
el re retri rieval in the wild
H-O3 O3D: Ha Hand+ d+Obj Object Da Dataset
It is possible to use only synthetic images for training, but we should still evaluate on real images.
20
21
22
65 sequences, 10 persons, 10 objects, about 85’000 frames in total
24
MANO model [Romero et al, 2017]
[Cali et al, 2015]
+ RGBD likelihood: Joint segmentation and depth constraints, enforced using differential rendering; + Physical constraints (avoids non-possible hand poses, avoids interpenetration between hand and object); + Temporal constraints: smooth motions over the sequence. joint segmentation prediction joint depth prediction
28
{(pH
t ,pO t )}t
t
t , pO t )k2 + kDt D(pH t , pO t )k2 +
t ) + kpH t+1 pH t k2 + kpO t+1 pO t k2
<latexit sha1_base64="8//n65i/ad2Q62cw3seGE4y6j6M=">AAAMq3icpdZbb9s2FABgubt1mrel3eNehAUp2rUJrHbFBuylydzWbpPa8Zw6WOgYlETJQnWrSCV2WL3sdfsL+2H7I3seJR8Fto/jLYgAwxS/w4uORIlWEvhcNBp/12599PEnn352+3P9i/qXX329 cefuWx5nqc2O7DiI02OLchb4ETsSvgjYcZIyGloBG1jvfil8cMZS7sdRX0wTNgypF/mub1OhqkYb/xCLeX4kaZrSaS6DINdJ6Eck8ENf8JEk8j6xkpE4bT0yZoXOA5KPRG4QnoVVmDDuGeRDW/1vG+0VDT6cPjYeqohmGdG8IoL8rGLIicnCoX6PdELm0SryQdm8DJfi29
Primary RGB-D cam, used for annotation
Secondary (sideview) RGB-D camera, used for validation only
Alexander Grabner Peter Roth Madhi Rad Shreyas Hampali
Alexander Grabner Peter Roth Madhi Rad Shreyas Hampali