WhenDeepLearningmeets VisualLocalization Mar MartinHu Humenberger - - PowerPoint PPT Presentation

when deep learning meets visual localization
SMART_READER_LITE
LIVE PREVIEW

WhenDeepLearningmeets VisualLocalization Mar MartinHu Humenberger - - PowerPoint PPT Presentation

WhenDeepLearningmeets VisualLocalization Mar MartinHu Humenberger NA NAVERLA LABS BSEur urop ope,3D 3DVision onGr Group oup http://europe.naverlabs.com Outline


slide-1
SLIDE 1

Mar MartinHu Humenberger NA NAVERLA LABS BSEur urop

  • pe,3D

3DVision

  • nGr

Group

  • up

http://europe.naverlabs.com

WhenDeepLearningmeets VisualLocalization

slide-2
SLIDE 2

Outline

1.3DVision@NAVERLABSEurope 2.VisualLocalization:Concept,Methods,Datasets 3.LocalFeatureExtraction(R2D2) 4.VSLAMinDynamicEnvironments(Slamantic)

slide-3
SLIDE 3
slide-4
SLIDE 4

3DVISIONatNAVERLABSEurope

slide-5
SLIDE 5

3DVision­ ResearchInterests

Wewanttoovercomecurrentlimitationsoftraditional,mainlygeometry-based,methodsof3D visionusingdatadrivenmachinelearningtechniques. Mainr researcht topics: a) Fundamentalmethodsof3Dvision

  • Correspondenceanalysis
  • Depthestimation

b) Cameraposeestimation

  • Visuallocalization
  • VSLAM/VO

c) 3Dsceneunderstanding

  • Semanticmapping
  • 3Dreconstruction

d) Syntheticdatasetsanddomainadaptation

  • Transferbetweensyntheticandrealworld
slide-6
SLIDE 6

VisualLocalization

slide-7
SLIDE 7

VisualLocalization- Concept

ChâteaudeSceaux GPSaccuracy sometimesnotenough. E.g.forpreciserobotnavigationor augmentedreality. Goal:Useanimagetoestimatethepr precise positionof thecamerawithinagivenarea(map).

slide-8
SLIDE 8

VisualLocalization- Concept

Thisworksindooraswell! There,itisinparticularusefulsinceGPSisnotavailable.

Vi Visual al Lo Localiza zation

slide-9
SLIDE 9

ApplicationExamples

slide-10
SLIDE 10

MethodsofVisualLocalization

slide-11
SLIDE 11

referenceimage(map) viewpointandscale

  • cclusion

illumination

ChallengesofVisualLocalization

viewpoint,occlusion,weather

slide-12
SLIDE 12

Descriptormatchingtoget 2D-3Dcorrespondences Camerapose estimation

2DInputImage 3DMap

Inputimage Featuredetection& description

StructureBasedVisualLocalization

Takeapicture

Location

slide-13
SLIDE 13

Descriptor matching to get 2D-3D correspondences Camera pose estimation

2D Input Image

Input image Image retrieval

ImageRetrievalBasedVisualLocalization

Large 3D Map

Location

slide-14
SLIDE 14

CameraPoseRegressionBasedVisualLocalization

No 3D Map but… CNN

image camerapose image camerapose image camerapose

slide-15
SLIDE 15

2D Input Image

Input image CNN to directly estimate the camera pose

CameraPoseRegressionBasedVisualLocalization

CNN

Location

slide-16
SLIDE 16

CNNtoregressdense 2D-3Dcorrespondences Camerapose estimation

2DInputImage 3DMap

Inputimage

Takeapicture

Location

SceneCoordinateRegressionBasedVisualLocalization

Featuredetection& description

slide-17
SLIDE 17

Overviewof Methods

St Stru ructure-ba based meth methods ds ActiveSearch[1] OpenMVG [2] + Perform verywellonmostdatasets->highaccuracy

  • Notsuitableforverylargeenvironments(memoryand

processingtime) Imager retrieval-ba based meth methods ds HF-Net[3] + Improve speedandrobustnessforlargescalesettings

  • Qualityheavilyreliesonimageretrieval

Camerap pose regressionm methods PoseNet [4] + Interestingapproachbecauseno3Dmapsareneededand itis datadriven(canbetrainedforcertainchallenges)

  • Lowaccuracy

Scenec coordinate regressionm methods DSAC++[5] + Veryaccurateinsmallscalesettings

  • Doesnotyetworkinlargescaleenvironments

[1]T.Sattleretal.,ImprovingImage-BasedLocalizationbyActiveCorrespondenceSearch,ECCV2012 [2]P.Moulon,OpenMVG:http://github.com/openMVG/openMVG [3]Sarlin etal.,FromCoarsetoFine:RobustHierarchicalLocalizationatLargeScale,CVPR2019 [4]A.Kendalletal.,PoseNet: http://mi.eng.cam.ac.uk/projects/relocalisation/,ICCV2015 [5]E.Brachmann etal.,LearningLessisMore­ 6DCameraLocalizationvia3DSurfaceRegression,CVPR2018

slide-18
SLIDE 18

MappingwithM1X

slide-19
SLIDE 19

Mean Matching Accuracy (MMA)

NAVERLABSMappingRobotM1X

slide-20
SLIDE 20

Mean Matching Accuracy (MMA)

slide-21
SLIDE 21

MappingwithStructurefrom Motion

slide-22
SLIDE 22

StructurefromMotion

J.Schönberger,RobustMethodsforAccurateandEfficient3DModelingfromUnstructuredImagery,PhD,ETHZ

slide-23
SLIDE 23

https://demuc.de/colmap/

http://imagine.enpc.fr/~moulonp/openMVG/

StructurefromMotion

slide-24
SLIDE 24

Datasets

slide-25
SLIDE 25

Alex Kendall, Matthew Grimes and Roberto Cipolla. PoseNet: A Convolut utiona nal Network for Real-Ti Time 6-DOF Camera Relocalization.

  • n. ICCV, 2015.

Datasets – Cambridge Landmarks – Outdoor Localization

  • 8,000 images from 6 scenes up to 100 x 500m

SlidecreditAlexKendall,https://pdfs.semanticscholar.org/4fc6/7b4dc62e9c8eee4259c3878b71c64958c373.pdf

RG RGB,SfM SfM

slide-26
SLIDE 26

Jamie Shotton et al. Scene coordinate regression forests for camera relocalization in RGB-D images. CVPR 2013

Datasets – Seven Scenes – Indoor Localization

  • 17,000 images across 7 small indoor scenes.

Slide credit Alex Kendall, https://pdfs.semanticscholar.org/4fc6/7b4dc62e9c8eee4259c3878b71c64958c373.pdf

RGB-D, pose, dense reconstruction

slide-27
SLIDE 27

AachenDay-Night

OldinnercityofAachen,Germany

  • 4328referenceimages
  • 922queryimages(824daytime,98nighttime)
  • Allimagesarecapturedwithhand-heldcameras

trainingimageexamples testimageexamples(day- night) 3Dreconstruction(sfm)

slide-28
SLIDE 28

BaiduIBLDataset

  • Capturedinashoppingmallusinghighres

camerasandalidar scanner

  • RGB(trainingandtesting),pointclouds,poses

Su Sunetal.,ADa Datase setforBe Benchmarking gImage ge-Ba Base sedLocalization,CVP CVPR1 R17

slide-29
SLIDE 29

VirtualGallery­ SyntheticDataset

slide-30
SLIDE 30

Tailoredtotestspecificchallengesofvisual localization,suchas:

  • Differentlightingconditions
  • Occlusions
  • Variouscameraparameters

Training:

  • Imitatearobotscanningthemuseum
  • 6cameras(360º),1virtuallidar
  • 5trajectories

Testing:

  • Imitatepicturestakenbypeople
  • Cameras:Randomintrinsics,randomorientation,random

position

  • Differentlightingconditionsandocclusions

VirtualGallery­ SyntheticDataset

Download:https://europe.naverlabs.com/research/3d-vision/virtual-gallery-dataset/

slide-31
SLIDE 31

VisualLocalizationusingObjectsof Interest

slide-32
SLIDE 32

ObjectsofInterest(OOI)aredistinctiveareaswithintheenvironmentwhichcanbedetected undervariousconditions. Pu Publ blisheda shedatC tCVPR VPR19

slide-33
SLIDE 33

1)Startwithinputimage 3)Usecorrespondencesto computethecameralocation 2)FeedintoOOInetwork

Map=

  • listofallOOIs
  • 3DlocationsofOOIs

Mainadvantage: Datadrivenapproach whichcanovercome commonVLchallenges. Pu Publ blisheda shedatC tCVPR VPR19

slide-34
SLIDE 34

LocalizationResults­ BaiduDataset

  • Structure-basedmethodsperformbest.
  • Learning-basedmethods(PoseNet,DSAC++)donotworkonthislargedataset.
  • Ourapproachisthefirstlearning-basedmethodwhichcanbeappliedhere.

Paper:https://europe.naverlabs.com/research/publications/visual-localization-by-learning-objects-of-interest-dense-match-regression/

slide-35
SLIDE 35

LocalFeatureExtraction R2D2­ RepeatableandReliable DetectorandDescriptor

slide-36
SLIDE 36

Motivation

  • Structure-basedmethodsperformwellandthecriticalpartisfeatureextractionand

matching.

  • A robustfeaturedetectorenablesrobustvisuallocalization
  • ...andimprovesmanyotherapplicationssuchasobjectdetection,VSLAMandSfM.
slide-37
SLIDE 37

Overview

Csurka etal.,Fromhandcraftedtodeeplocalfeatures,arXiv 2018

slide-38
SLIDE 38

Introduction

1)Startwithinputimage

  • Classicalmethods:Detect-then

then-describe

2)Detectkeypoints

Keypoint detector Extract patches

3)Describekeypoints !

Patch descriptor

slide-39
SLIDE 39

1)Startwithinputimage

Introduction

  • Classicalmethods:Detect-then

then-describe

  • Ourapproach:

Detect-an and-describe

2)FeedintoR2D2network 3)Detectkeypoints & describethematonce Keypoints (nms) descriptor foreachkeypoint !

slide-40
SLIDE 40

Approach

Repeatable? Reliable? Ourapproach

  • Detect-and-describe(dense)topredictrepeatabilityandreliabilityseparately
  • Novellosstoestimatethereliability(or“matchability”)
  • Novelself-supervisedlosstolearnrepeatabilitywithoutintroducinganybiases

No Yes No Yes

  • Re

Repeat atab ability:imagelocationsthatareinvarianttousualimagetransformations(e.g.corners)

  • Re

Reliab ability:imagelocationsthataregood(discriminativeandrobust)formatchingpurpose è Allcasesarepossible: reliability andrepeatability areindependent 3

3

1

1

2

2

4

4

slide-41
SLIDE 41

Res Resul ults ts

Imagewith top-scoredkeypoints repeatabilitymap reliabilitymap

slide-42
SLIDE 42

ExampleofFeatureMatchingusingR2D2

Thecoloredcrossesindicatematchedkeypoints.Ascanbeseen,ourmethodevenworksundervery challengingconditionssuchasday-nightimagepairsandlargeviewpointchanges.

slide-43
SLIDE 43

116sequencesof6images

  • 57containinglargechangesinillumination
  • 59containinglargechangesinviewpoint

HP HPat atches es

Results

slide-44
SLIDE 44

Results

MMA

  • R2D2outperformsthestateoftheartonHPatches.
  • ThemetricusedisMeanMatchingAccuracy(MMA).
slide-45
SLIDE 45

DetailedResultsontheAachenDay-NightDataset

  • Accu

Accuracy acy(higherisbetter) è R2D2:outperformsallotherapproaches,includingrecentones

  • Numberof

fke keypo ypoints ts (lessisbetter) è R2D2:equalorlessthanotherapproaches

  • Fe

Featuredimension(lessisbetter) è R2D2:muchsmallerthanothertop-rankingapproaches(upto8xsmaller)

  • Model

elsiz size e(memory,lessisbetter) è R2D2:muchsmallerthanothertop-rankingapproaches(upto15xsmaller) Codeandmodelswillbereleased!

R2D2 accuracy Classicapproach MagicLeap Google Benchmarkcreators

slide-46
SLIDE 46

VSLAMinDynamicEnvironments

slide-47
SLIDE 47

VSLAM­ VisualSimultaneousLocalizationandMapping

Example:ORB-SLAM2

Raúl Mur-Artal and Juan D. Tardós, ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras

slide-48
SLIDE 48

VSLAMinDynamicEnvironments

Challenge:

  • Structure-basedVSLAMassumesthattheworldisstatic.
  • Dynamicareasaretreatedasoutliers.
  • Thisdoesnotworkifthedynamicareasaredominant.
slide-49
SLIDE 49

SLAMANTIC(ICCV19workshoppaper)

  • Weproposetousesemanticinformation(inadditiontogeometry)tohandledynamic

areasinthescene.

  • Ourapproachestimatesaconfidencevaluewhichisusedtoselectkeypoints forthe

mappingpart.

slide-50
SLIDE 50

Codeisavailableonline!

SLAMANTIC- LeveragingSemanticstoImproveVSLAMInDynamic Environments

slide-51
SLIDE 51

Conclusion

slide-52
SLIDE 52

Conclusion

  • 1. VisualLocalizationisanenablingtechnologyformany

applications,e.g.,inrobotics.

  • 2. Itisverychallengingduetotheeverchangingworld.
  • 3. Thereisverygoodprogressinthefieldbutitisfarfrom

beingsolved.

  • 4. Datadrivenmethodsmighthelpmakingitmorerobustin

thefuture.

slide-53
SLIDE 53

ThankYou

slide-54
SLIDE 54

Resources

R2D2:https://github.com/naver/r2d2 SLAMANTIC:https://github.com/mthz/slamantic VKITTI:https://europe.naverlabs.com/research/computer-vision/proxy-virtual- worlds/ VirtualGallery:https://europe.naverlabs.com/research/3d-vision/virtual-gallery- dataset/ LocalFeaturesSurvey:https://arxiv.org/abs/1807.10254 COLMAP:https://colmap.github.io/ OpenMVG:https://github.com/openMVG/openMVG VisualLocalizationBenchmark:http://visuallocalization.net VisualLocalizationTutorial:https://sites.google.com/view/lsvpr2019/home BaiduIBLdataset:https://sites.google.com/site/xunsunhomepage/