Robust Pose Optimization Made Differentiable Eric Brachmann 5th - - PowerPoint PPT Presentation

robust pose optimization made differentiable
SMART_READER_LITE
LIVE PREVIEW

Robust Pose Optimization Made Differentiable Eric Brachmann 5th - - PowerPoint PPT Presentation

Robust Pose Optimization Made Differentiable Eric Brachmann 5th International Workshop on Recovering 6D Object Pose @ICCV19 Background 2012-2017 Dr. PhD at Eric Brachmann @eric_brachmann since 2018 Post-Doc at since 2019 Guest at Prof.


slide-1
SLIDE 1

Robust Pose Optimization Made Differentiable

Eric Brachmann 5th International Workshop on Recovering 6D Object Pose @ICCV19

slide-2
SLIDE 2

Background

2

Prof.

Carsten Rother Dr.

Eric Brachmann

since 2018

Post-Doc at

2012-2017

PhD at

since 2019

Guest at

@eric_brachmann

slide-3
SLIDE 3

Main Research Interests

3

  • Machine learning and projective geometry
  • Robust fitting with (differentiable) RANSAC
  • Object poses
  • Camera poses
  • Lines
  • Epipolar Geometry

DSAC – CVPR‘17 DSAC++ – CVPR‘18 Object Coordinates – ECCV‘14 NG-RANSAC – ICCV‘19

slide-4
SLIDE 4

Goal

4

Pose Estimation Pipeline Object Detection Object Classification Correspondence Prediction Pose Loss RANSAC

Pose Solver Pose Scoring RGB(-D) Image 𝐽 6D Poses መ 𝐢𝑝

“Learning 6D object pose estimation using 3D object coordinates”, Brachmann et al., ECCV’14 “iPose: instance-aware 6D pose estimation of partly occluded objects”, Jafari et al., ACCV’18 “Segmentation-driven 6D Object Pose Estimation”, Hu et al., CVPR’19 “Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation”, Park et al., ICCV’19 “DPOD: 6D Pose Object Detector and Refiner”, Zakharov et al., ICCV’19 …

slide-5
SLIDE 5

Why End-to-End?

5

Pose Estimation Pipeline

RGB(-D) Image 𝐽 6D Camera Pose መ 𝐢

Object Detection Object Classification Correspondence Prediction Pose Loss RANSAC

Pose Solver Pose Scoring

slide-6
SLIDE 6

Why End-to-End?

6 86.5% 50.9% 88.1% 61.7%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 5cm, 5° 2cm, 2°

Re-Localization Rate

Indoor [ESAC]

Initialization End-to-End 31 19

5 10 15 20 25 30 35

Median Tranlation Error (cm)

Outdoor [NGRANSAC]

  • 10px

+10px ±0px Improvement Degradation

Comparing reprojection error before and after end-to-end training:

[ESAC] „Expert Sample Consensus Applied to Camera Re-Localization”, Brachmann and Rother, ICCV’19 [NGRANSAC] “Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses”, Brachmann and Rother, ICCV19

slide-7
SLIDE 7

Roadmap

7

Object Detection Object Classification Correspondence Prediction Pose Loss RANSAC

Pose Solver Pose Scoring

slide-8
SLIDE 8

Pose Loss (RGB-D)

8

Object Detection Object Classification Correspondence Prediction Pose Loss RANSAC

Pose Solver Pose Scoring

Input: RGB-D ℓ 𝐮, 𝐮∗ + 𝛽ℓ 𝑆, 𝑆∗ with 𝐢 = 𝐮, 𝑆 𝐮 − 𝐮∗ log(𝑆∗𝑆T) with log 𝑆 : ℝ3×3 → ℝ3

𝜄 𝑆 𝑆∗ in OpenCV: cv2.Rodrigues()

  • incl. gradients
slide-9
SLIDE 9

Pose Loss (RGB)

9

Object Detection Object Classification Correspondence Prediction Pose Loss RANSAC

Pose Solver Pose Scoring

Input: RGB

[Bra16] Brachmann et al., “Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image”, CVPR 2016

ℓ𝜌 𝐢, 𝐢∗ =

1 |𝒲| σ𝐰∈𝒲 𝐷𝐢∗𝐰 − 𝐷𝐢𝐰 [Bra16]

Z-Err: 5cm 10cm 20cm

𝒲... Model vertices 𝐷... Camera calibration matrix

slide-10
SLIDE 10

Pose Solver (RGB-D)

10

Object Detection Object Classification Correspondence Prediction Pose Loss RANSAC

Pose Solver Pose Scoring

Input: RGB-D

[Kab76] Kabsch, “A solution for the best rotation to relate two sets of vectors”, Acta Crystallographica, 1976

Kabsch Algorithm:

𝐳𝑗 𝐲𝑗

෠ 𝑆, Ƹ 𝐮 = argmin

𝑆,𝐮|𝑆𝑆𝑈=1

𝑗

𝐲𝒋 − 𝑆𝐳𝑗 − 𝒖

𝟑

cov 𝐲i, 𝐳i = ෍

𝑗

(𝐲𝒋−ത 𝐲)(𝐳𝑗 − ത 𝐳)𝑈 cov 𝐲i, 𝐳i = 𝑉𝛵𝑊𝑈 ෠ 𝑆 = 𝑊 1 1 det(𝑊𝑉𝑈) 𝑉𝑈 Ƹ 𝐮 = ෠ 𝑆ത 𝐳-ത 𝐲 C++ code with PyTorch integration coming soon.

slide-11
SLIDE 11

Pose Solver (RGB)

11

Object Detection Object Classification Correspondence Prediction Pose Loss RANSAC

Pose Solver Pose Scoring

Input: RGB

෠ 𝑆, Ƹ 𝐮 = argmin

𝑺,𝐮

𝑗

𝐪𝑗 − 𝐷 𝑆𝐳𝑗 − 𝐮

2

𝐳𝑗 𝒒𝑗

Gauss-Newton Initialization

Solving Perspective-n-Point: [Lep09] Lepetit et al., “EPnP: An Accurate O(n) Solution to the PnP Problem”, IJCV’09 [Gao03] Gao et al., “Complete Solution Classification for the Perspective-Three-Point Problem”, TPAMI’03

slide-12
SLIDE 12

Pose Solver (RGB)

12

Object Detection Object Classification Correspondence Prediction Pose Loss RANSAC

Pose Solver Pose Scoring

Gauss-Newton Initialization

𝐢0 𝐢1 Residual vector: 𝐬 𝐢

𝑗 =

𝐪𝑗 − 𝐷𝐢𝐳𝑗

2

Update Rule: 𝐢𝑢+1 = 𝐢𝑢 − 𝐾𝐬

𝑈𝐾𝐬 −1𝐾𝐬 𝑈𝐬(𝐢𝑢)

Jacobean: [𝐾𝐬]𝑗𝑘 =

𝜖 𝐬 𝐢𝑢

𝑗

𝜖 𝐢𝑢 𝑘

slide-13
SLIDE 13

Pose Solver (RGB)

13

Object Detection Object Classification Correspondence Prediction Pose Loss RANSAC

Pose Solver Pose Scoring

Gauss-Newton Initialization

𝐢0 𝐢1 Residual vector: 𝐬 𝐢

𝑗 =

𝐪𝑗 − 𝐷𝐢𝐳𝑗

2

Update Rule: 𝐢𝑢+1 = 𝐢𝑢 − 𝐾𝐬

𝑈𝐾𝐬 −1𝐾𝐬 𝑈𝐬(𝐢𝑢)

Jacobean: [𝐾𝐬]𝑗𝑘 =

𝜖 𝐬 𝐢𝑢

𝑗

𝜖 𝐢𝑢 𝑘

Last update: መ 𝐢 = 𝐢∞ − 𝐾𝐬

𝑈𝐾𝐬 −1𝐾𝐬 𝑈𝐬(𝐢∞)

Gradients:

𝜖 𝜖𝐳𝑗

መ 𝐢 ≈ − 𝐾𝐬

𝑈𝐾𝐬 −1𝐾𝐬 𝑈 𝜖 𝜖𝐳𝑗 𝐬(𝐢∞)

slide-14
SLIDE 14

Pose Solver (RGB)

14

Object Detection Object Classification Correspondence Prediction Pose Loss RANSAC

Pose Solver Pose Scoring

Gauss-Newton Initialization

[För16] Förstner and Wrobel, “Photogrammetric Computer Vision – Statistics, Geometry, Orientation and Reconstruction”, Springer’16 [Bra18] Brachmann and Rother, “Learning less is more - 6D camera localization via 3D surface regression”, CVPR’18

𝐢0 𝐢1 Residual vector: 𝐬 𝐢

𝑗 =

𝐪𝑗 − 𝐷𝐢𝐳𝑗

2

Update Rule: 𝐢𝑢+1 = 𝐢𝑢 − 𝐾𝐬

𝑈𝐾𝐬 −1𝐾𝐬 𝑈𝐬(𝐢𝑢)

Jacobean: [𝐾𝐬]𝑗𝑘 =

𝜖 𝐬 𝐢𝑢

𝑗

𝜖 𝐢𝑢 𝑘

Last update: መ 𝐢 = 𝐢∞ − 𝐾𝐬

𝑈𝐾𝐬 −1𝐾𝐬 𝑈𝐬(𝐢∞)

Gradients:

𝜖 𝜖𝐳𝑗

መ 𝐢 ≈ − 𝐾𝐬

𝑈𝐾𝐬 −1𝐾𝐬 𝑈 𝜖 𝜖𝐳𝑗 𝐬(𝐢∞)

C++ code of [Bra18] online. Version with PyTorch integration coming soon.

slide-15
SLIDE 15

Hypothesis Selection

RANSAC

15

Object Detection Object Classification Correspondence Prediction Pose Loss RANSAC

Pose Solver Pose Scoring

Reprojection Errors of 𝐢2

𝐢1 𝐢3 𝐢4 𝐢2

መ 𝐢

𝑡(𝐢1, 𝐳) 𝑡(𝐢4, 𝐳) 𝑡(𝐢2, 𝐳) 𝑡(𝐢3, 𝐳)

Image Correspondence Prediction Hypothesis Sampling Scoring Result Soft Inlier Counting [Bra18]: 𝑡 𝐢, 𝐳 = ෍

𝑗

sig(𝜐 − 𝛾 𝐪𝑗 − 𝐷𝐢𝐳𝑗 ) መ 𝐢 = argmax

𝐢𝑘

𝑡(𝐢𝑘, 𝐳) argmax Selection non-differentiable hard decision መ 𝐢 = 𝐢𝑘, where 𝑘~ exp(𝑡(𝐢𝑘𝐳)) σ𝑙 exp(𝑡(𝐢𝑙𝐳)) Probabilistic Selection [Bra17] differentiable hard decision

[Bra17] Brachmann et al., “DSAC - Differentiable RANSAC for camera localization”, CVPR’17 [Bra18] Brachmann and Rother, “Learning less is more - 6D camera localization via 3D surface regression”, CVPR’18

slide-16
SLIDE 16

Differentiable RANSAC (DSAC)

16

Object Detection Object Classification Correspondence Prediction Pose Loss RANSAC

Pose Solver Pose Scoring

[Bra17] Brachmann et al., “DSAC - Differentiable RANSAC for camera localization”, CVPR’17

መ 𝐢 = 𝐢𝑘, where 𝑘~ exp(𝑡(𝐢𝑘, 𝐳)) σ𝑙 exp(𝑡(𝐢𝑙, 𝐳)) = 𝑄 𝑘; 𝐳 ℒ 𝐳 = 𝔽𝑘~𝑄 𝑘;𝐳 ℓ(𝐢𝑘, 𝐢∗) 𝜖 𝜖𝐳 ℒ 𝐳 = 𝔽𝑘~𝑄 𝑘;𝐳 ℓ 𝐢𝑘, 𝐢∗ 𝜖 𝜖𝐳 log 𝑄 𝑘; 𝐳 + 𝜖 𝜖𝐳 ℓ 𝐢𝑘, 𝐢∗

derivative of selection probability derivative of task loss Hypothesis selection: Learning objective: Gradients: C++ code for camera re- localization online. PyTorch code for DSAC line fitting also online.

slide-17
SLIDE 17

Differentiable RANSAC (DSAC)

17

PoseNet 149cm, 3.4° Active Search 19cm, 0.5° DSAC++ 13cm, 0.4°

[Posenet] “Geometric Loss Functions for Camera Pose Regression with Deep Learning” Kendall and Cipolla, CVPR ’17 [Active Search] “Efficient & effective prioritized matching for large-scale image-based localization”, Sattler et al., TPAMI’17 [DSAC] “DSAC - Differentiable RANSAC for Camera Localization”, Brachmann et al., CVPR’17 [DSAC++] “Learning Less is More – 6D Camera Localization via 3D Surface Regression”, Brachmann and Rother, CVPR’18

slide-18
SLIDE 18

Correspondence Prediction

18

Object Detection Object Classification Correspondence Prediction Pose Loss RANSAC

Pose Solver Pose Scoring Input Image Dense Correspondences

𝐱

RANSAC / DSAC

slide-19
SLIDE 19

Neural Guided RANSAC (NG-RANSAC)

19

Object Detection Object Classification Correspondence Prediction Pose Loss RANSAC

Pose Solver Pose Scoring Dense Correspondences RANSAC / DSAC Sampling Weight 1 Input Image

𝐱

Selecting a scene coordinate: 𝑞 𝐳 = 𝑕(𝐽; 𝐱) Selecting a hypothesis: 𝑞 𝐢 = ς𝑗=0

4

𝑞 𝐳𝑗 Selecting a hypotheses pool: 𝑞 ℋ = ς𝑘 𝑞 𝐢𝑘 Learning objective: 𝔽ℋ~𝑞 ℋ ℒ 𝐱

= 𝔽ℋ~𝑞 ℋ 𝔽𝑘~𝑄 𝑘|ℋ;𝐱 ℓ(𝐢𝑘, 𝐢∗)

Neural Guidance DSAC

slide-20
SLIDE 20

Neural Guided RANSAC (NG-RANSAC)

20

Object Detection Object Classification Correspondence Prediction Pose Loss RANSAC

Pose Solver Pose Scoring

PoseNet ActiveSearch DSAC++ NG-DSAC++ Great Court 700cm

  • 40.3cm

35.0cm Kings College 99cm 42cm 13.0cm 12.6cm Old Hospital 217cm 44cm 22.4cm 21.9cm Shop Facade 107cm 12cm 5.7cm 5.6cm St M. Church 149cm 19cm 9.9cm 9.8cm

Sampling Weight 1

[PoseNet] “Geometric Loss Functions for Camera Pose Regression with Deep Learning” Kendall and Cipolla, CVPR ’17 [ActiveSearch] “Efficient & effective prioritized matching for large-scale image-based localization”, Sattler et al., TPAMI’17 [DSAC++] “Learning Less is More – 6D Camera Localization via 3D Surface Regression”, Brachmann and Rother, CVPR’18

[NG-DSAC++] “Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses”, Brachmann and Rother, ICCV19

slide-21
SLIDE 21

Object Classification

21

Object Detection Object Classification Correspondence Prediction Pose Loss RANSAC

Pose Solver Pose Scoring Environment Classes Query Image

slide-22
SLIDE 22

Object Classification

22

RANSAC Hypotheses ℋ Pose Estimate መ 𝐢 Expert Networks Gating Network [Jacobs‘91] „Adaptive Mixtures of Local Experts“, Jacobs et al., Neural Computation, 1991 [ESAC] „Expert Sample Consensus Applied to Camera Re-Localization”, Brachmann and Rother, ICCV’19

slide-23
SLIDE 23

Object Classification

23

7Scenes+12Scenes [ESAC] [DSAC++] Brachmann and Rother, “Learning less is more - 6D camera localization via 3D surface regression”, CVPR’18 [ESAC] „Expert Sample Consensus Applied to Camera Re-Localization”, Brachmann and Rother, ICCV’19

Average Accuracy (5cm,5°): Classification + DSAC++: 47.5% Oracle + DSAC++: 89.0%

Classification Accuracy

slide-24
SLIDE 24

Object Classification

24

7Scenes+12Scenes [ESAC] [DSAC++] Brachmann and Rother, “Learning less is more - 6D camera localization via 3D surface regression”, CVPR’18 [ESAC] „Expert Sample Consensus Applied to Camera Re-Localization”, Brachmann and Rother, ICCV’19

Average Accuracy (5cm,5°): Classification + DSAC++: 47.5% Oracle + DSAC++: 89.0%

Classification Accuracy

slide-25
SLIDE 25

Object Classification

25

Pose Estimate መ 𝐢 Expert Networks Gating Network [ESAC] „Expert Sample Consensus Applied to Camera Re-Localization”, Brachmann and Rother, ICCV’19 RANSAC Hypotheses ℋ

slide-26
SLIDE 26

Expert Sample Consensus

26

[ESAC] „Expert Sample Consensus Applied to Camera Re-Localization”, Brachmann and Rother, ICCV’19 Pose Estimate መ 𝐢 Expert Networks Gating Network RANSAC Hypotheses ℋ

slide-27
SLIDE 27

Expert Sample Consensus

27

Pose Estimate መ 𝐢 Expert Networks Gating Network

ℒ 𝐱 = 𝔽ℋ~𝑄(ℋ)𝔽𝑘~𝑄(𝑘|ℋ) ℓ(𝐢𝑘)

Differentiable Objective Function: RANSAC Hypotheses ℋ [ESAC] „Expert Sample Consensus Applied to Camera Re-Localization”, Brachmann and Rother, ICCV’19 𝑄 ℋ ∝ 𝑕(𝐽, 𝐱) 𝑕(𝐽, 𝐱)

slide-28
SLIDE 28

Expert Sample Consensus

28

7Scenes+12Scenes [ESAC] [DSAC++] Brachmann and Rother, “Learning less is more - 6D camera localization via 3D surface regression”, CVPR’18 [ESAC] „Expert Sample Consensus Applied to Camera Re-Localization”, Brachmann and Rother, ICCV’19

Average Accuracy (5cm,5°): Classification + DSAC++: 47.5% Oracle + DSAC++: 89.0% ESAC: 88.1%

Classification Accuracy

slide-29
SLIDE 29

Object Detection

29

Object Detection Object Classification Correspondence Prediction Pose Loss RANSAC

Pose Solver Pose Scoring

slide-30
SLIDE 30

Conclusion

30

Object Detection Object Classification Correspondence Prediction Pose Loss RANSAC

Pose Solver Pose Scoring

Conclusion:

  • Differentiable PnP [Bra18]
  • Differentiable RANSAC → [DSAC]
  • Differentiable Correspondence Selection → [NG-RANSAC]
  • Differentiable Expert Selection → [ESAC]

[Bra18] Brachmann and Rother, “Learning less is more - 6D camera localization via 3D surface regression”, CVPR’18 [DSAC] Brachmann et al., “DSAC - Differentiable RANSAC for camera localization”, CVPR’17 [NG-RANSAC] Brachmann and Rother, “Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses”, ICCV19 [ESAC] Brachmann and Rother, “Expert Sample Consensus Applied to Camera Re-Localization”, ICCV’19

slide-31
SLIDE 31

Conclusion

31

Object Detection Object Classification Correspondence Prediction Pose Loss RANSAC

Pose Solver Pose Scoring

Conclusion:

  • Differentiable PnP [Bra18]
  • Differentiable RANSAC → [DSAC]
  • Differentiable Correspondence Selection → [NG-RANSAC]
  • Differentiable Expert Selection → [ESAC]

DSAC for camera re-localization [Lua/Torch]: https://github.com/cvlab-dresden/DSAC DSAC for Line Fitting [PyTorch]: https://github.com/vislearn/DSACLine DSAC++ for Camera Re-Localization, incl. differentiable PnP [Lua/Torch]: https://github.com/vislearn/LessMore DSAC*, improved DSAC++ incl. differentiable PnP and differentiable Kabsch [PyTorch]: Coming soon ESAC, differentiable expert selection [PyTorch]: Coming soon (https://hci.iwr.uni-heidelberg.de/vislearn/research/scene-understanding/pose-estimation/#ICCV19) NG-DSAC, differentiable correspondence selection [PyTorch]: Coming soon (https://hci.iwr.uni-heidelberg.de/vislearn/research/neural-guided-ransac/)

Code of many methods online:

slide-32
SLIDE 32

The End

32

Thank You!