WhenDeepLearningmeets VisualLocalization Mar MartinHu Humenberger - PowerPoint PPT Presentation

When�Deep�Learning�meets Visual�Localization Mar Martin�Hu Humenberger NA NAVER�LA LABS� BS�Eur urop ope,�3D 3D�Vision on�Gr Group oup http://europe.naverlabs.com

Outline 1.�3D�Vision�@�NAVER�LABS�Europe 2.�Visual�Localization:�Concept,�Methods,�Datasets 3.�Local�Feature�Extraction�(R2D2) 4.�VSLAM�in�Dynamic�Environments�(Slamantic)

3D�VISION�at�NAVER�LABS�Europe

3D�Vision� Research�Interests We�want�to�overcome�current�limitations�of�traditional,�mainly�geometry-based,�methods�of�3D� vision�using�data�driven�machine�learning�techniques. Main�r �research�t �topics: a) Fundamental�methods�of�3D�vision • Correspondence�analysis • Depth�estimation b) Camera�pose�estimation • Visual�localization • VSLAM�/�VO c) 3D�scene�understanding • Semantic�mapping • 3D�reconstruction d) Synthetic�datasets�and�domain�adaptation • Transfer�between�synthetic�and�real�world

Visual�Localization

Visual�Localization�- Concept GPS�accuracy sometimes�not�enough. E.g.�for�precise�robot�navigation�or� augmented�reality. Ch â teau�de�Sceaux Goal:�Use�an�image�to�estimate�the� pr precise position�of� the�camera�within�a�given�area�(map).

Visual�Localization�- Concept This�works�indoor�as�well! There,�it�is�in�particular�useful�since�GPS�is�not�available.� Vi Visual� al� Lo Localiza zation

Application�Examples

Methods�of�Visual�Localization

Challenges�of�Visual�Localization illumination viewpoint�and�scale reference�image�(map) occlusion viewpoint,�occlusion,�weather

Structure�Based�Visual�Localization 3D�Map 2D�Input�Image Take�a�picture Descriptor�matching�to�get� Camera�pose� Feature�detection�&� Location Input�image 2D-3D�correspondences estimation description

Image�Retrieval�Based�Visual�Localization Large 3D Map 2D Input Image Descriptor matching to get Camera pose Image retrieval Location Input image 2D-3D correspondences estimation

Camera�Pose�Regression�Based�Visual�Localization No 3D Map but… image camera�pose image camera�pose CNN image camera�pose

Camera�Pose�Regression�Based�Visual�Localization 2D Input Image CNN CNN to directly estimate Location Input image the camera pose

Scene�Coordinate�Regression�Based�Visual�Localization 3D�Map 2D�Input�Image Take�a�picture CNN�to�regress�dense� Camera�pose� Feature�detection�&� Location Input�image 2D-3D�correspondences estimation description

Overview�of� Methods + Perform very�well�on�most�datasets�->�high�accuracy St Stru ructure-ba based� Active�Search�[1] - meth methods ds OpenMVG [2] Not�suitable�for�very�large�environments�(memory�and� processing�time) + Improve speed�and�robustness�for�large�scale�settings Image�r �retrieval-ba based� HF-Net�[3] - meth methods ds Quality�heavily�relies�on�image�retrieval + Interesting�approach�because�no�3D�maps�are�needed�and it�is� Camera�p �pose� PoseNet [4] regression�m �methods data�driven�(can�be�trained�for�certain�challenges) - Low�accuracy + Very�accurate�in�small�scale�settings Scene�c �coordinate� DSAC++�[5] - regression�m �methods Does�not�yet�work�in�large�scale�environments [1]�T.�Sattler�et�al.,�Improving�Image-Based�Localization�by�Active�Correspondence�Search,�ECCV�2012 [2]�P.�Moulon,�OpenMVG:�http://github.com/openMVG/openMVG [3]�Sarlin et�al.,�From�Coarse�to�Fine:�Robust�Hierarchical�Localization�at�Large�Scale,�CVPR�2019 [4]�A.�Kendall�et�al.,�PoseNet: http://mi.eng.cam.ac.uk/projects/relocalisation/,�ICCV�2015 [5]�E.�Brachmann et�al.,�Learning�Less�is�More� 6D�Camera�Localization�via�3D�Surface�Regression,�CVPR�2018

Mapping�with�M1X

NAVER�LABS�Mapping�Robot�M1X Mean Matching Accuracy (MMA)

Mean Matching Accuracy (MMA)

Mapping�with�Structure�from� Motion

Structure�from�Motion J.�Sch ö nberger,�Robust�Methods�for�Accurate�and�Efficient�3D�Modeling�from�Unstructured�Imagery,�PhD,�ETHZ

Structure�from�Motion http://imagine.enpc.fr/~moulonp/openMVG/ https://demuc.de/colmap/

Datasets

Datasets – Cambridge Landmarks – Outdoor Localization • 8,000 images from 6 scenes up to 100 x 500m RG RGB,�SfM SfM Alex Kendall, Matthew Grimes and Roberto Cipolla. PoseNet: A Convolut utiona nal Network for Real-Ti Time 6-DOF Camera Relocalization. n. ICCV, 2015. Slide�credit�Alex�Kendall,�https://pdfs.semanticscholar.org/4fc6/7b4dc62e9c8eee4259c3878b71c64958c373.pdf

Datasets – Seven Scenes – Indoor Localization • 17,000 images across 7 small indoor scenes. RGB-D, pose, dense reconstruction Jamie Shotton et al. Scene coordinate regression forests for camera relocalization in RGB-D images. CVPR 2013 Slide credit Alex Kendall, https://pdfs.semanticscholar.org/4fc6/7b4dc62e9c8eee4259c3878b71c64958c373.pdf

Aachen�Day-Night Old�inner�city�of�Aachen,�Germany - 4328�reference�images - 922�query�images�(824�daytime,�98�nighttime) - All�images�are�captured�with�hand-held�cameras 3D�reconstruction�(sfm) training�image�examples test�image�examples�(day�- night)

Baidu�IBL�Dataset - Captured�in�a�shopping�mall�using�high�res� cameras�and�a�lidar scanner - RGB�(training�and�testing),�point�clouds,�poses� Su Sun�et�al.,�A�Da Datase set�for�Be Benchmarking� g�Image ge-Ba Base sed�Localization,�CVP CVPR1 R17

Virtual�Gallery� Synthetic�Dataset

Virtual�Gallery� Synthetic�Dataset Tailored�to�test�specific�challenges�of�visual� localization,�such�as:� • Different�lighting�conditions • Occlusions • Various�camera�parameters Training:� • Imitate�a�robot�scanning�the�museum • 6�cameras�(360º),�1�virtual�lidar • 5�trajectories Testing: • Imitate�pictures�taken�by�people • Cameras�:�Random�intrinsics,�random�orientation,�random� position • Different�lighting�conditions�and�occlusions Download:�https://europe.naverlabs.com/research/3d-vision/virtual-gallery-dataset/

Visual�Localization�using�Objects�of� Interest

Pu Publ blished�a shed�at�C t�CVPR VPR19� Objects�of�Interest�(OOI)�are�distinctive�areas�within�the�environment�which�can�be�detected� under�various�conditions.�

Publ Pu blished�a shed�at�C t�CVPR VPR19� Main�advantage: - list�of�all�OOIs� Map�=� Data�driven�approach� - 3D�locations�of�OOIs which�can�overcome� common�VL�challenges. 3)�Use�correspondences�to� 1)�Start�with�input�image 2)�Feed�into�OOI�network compute�the�camera�location

Localization�Results� Baidu�Dataset • Structure-based�methods�perform�best. • Learning-based�methods�(PoseNet,�DSAC++)�do�not�work�on�this�large�dataset. • Our�approach�is�the�first�learning-based�method�which�can�be�applied�here. Paper:�https://europe.naverlabs.com/research/publications/visual-localization-by-learning-objects-of-interest-dense-match-regression/

Local�Feature�Extraction R2D2� Repeatable�and�Reliable� Detector�and�Descriptor

Motivation • Structure-based�methods�perform�well�and�the�critical�part�is�feature�extraction�and� matching. • A robust�feature�detector�enables�robust�visual�localization • ...�and�improves�many�other�applications�such�as�object�detection,�VSLAM�and�SfM.

Overview Csurka et�al.,�From�handcrafted�to�deep�local�features,�arXiv 2018

Introduction - Classical�methods: Detect- then then -describe ! Extract Patch Keypoint patches descriptor detector 3)�Describe�keypoints 1)�Start�with�input�image 2)�Detect�keypoints

Introduction - Classical�methods: Detect- then then -describe - Our�approach:� Detect- an and -describe Keypoints (nms) ! descriptor� for�each�keypoint 3)�Detect�keypoints &� 1)�Start�with�input�image 2)�Feed�into�R2D2�network describe�them�at�once

WhenDeepLearningmeets VisualLocalization Mar MartinHu Humenberger - PowerPoint PPT Presentation

WhenDeepLearningmeets VisualLocalization Mar MartinHu Humenberger NA NAVERLA LABS BSEur urop ope,3D 3DVision onGr Group oup http://europe.naverlabs.com Outline

Category-level localization Cordelia Schmid Category-level localization Localization of

DEEP SEMANTIC-VISUAL EMBEDDING WITH LOCALIZATION Thursday 4th October, 2018 Martin Engilberge,

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Localization Nischal K N System Overview Mapping Hector Mapping Localization Path Planning

Category-level localization Cordelia Schmid Category-level localization Localization up to a

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

Anderson Localization Alaska Subedi April 24, 2008 Alaska Subedi Anderson Localization

Lecture 18: Localization Lecture 18: Localization algorithms algorithms Mythili Vutukuru CS

E. Elnahrawy, X. Li, and R. Martin Rutgers U. WLAN-Based Localization Localization in

Localization in Sensor Networks Rahul Jain ETH Z urich May 5, 2010 Rahul Jain Localization

Robot Localization Localization Robot and and Kalman Filters Filters Kalman Rudy Negenborn

R-CNN minus R Karel Lenc, Andrea Vedaldi Object detection 2 Goal : tightly enclose objects of a

Mark Carlson Hui Shan Missaka Warusawitharana Motivation Concerns about the impact of bank

When to invest in high speed rail British experience Chris Nash Research Professor

Security Regression Addressing Security Regression by Unit Testing Christopher Grayson

Relationship Between Commodities and Currency Pairs Derrick Hang Econ 201FS April 14, 2010

Monetary Policy and the Uncovered Interest Rate Parity Puzzle Dave Backus, Federico Gavazzoni,

Train ined to Kil ill: Battlefield Part rticipation in in Kurdish Fig ighters Matthew

1. The International Futures forecasting system (IFs) 2. Four datasets on democracy 3. The