Feature-based Place Recognition Akihiko Torii Tokyo Tech CVPR 2017 - PowerPoint PPT Presentation

1 Feature-based Place Recognition Akihiko Torii Tokyo Tech CVPR 2017 tutorial on Large-Scale Visual Place Recognition and Image-Based Localization Alex Kendall, Torsten Sattler, Giorgos Tolias, Akihiko Torii

2 Introduction • Challenges in large-scale place recognition • Local feature based image representation • Instant level recognition to place recognition • Datasets and evaluation protocol • Feature-based Place Recognition

3 Introduction Feature-based Place Recognition

4 Visual Place Recognition Where? Search box

5 The approach Represent the world by a set of geotagged images • Given a query image, find the best matching image • Transfer the geotag of the best matching image • Query https://www.google.co.jp/maps/ @35.6066354,139.6861582,3a,45.2y,256.68h,96.58t

6 Why is this interesting? Mapping/organizing any photos • on the globe Recognition & geometry •

7 (Visual) place recognition [Knopp10,Torii13, Maddern14, Johns14, Sunderhauf15] Location recognition [Cao13, Arandjelovic14, Sattler15] Landmark identification [Chen11] Geo-localization [Hays08, Cummins08, Zamir10, Zamir16, Kim17]

8 Challenges in large-scale   place recognition Feature-based Place Recognition

9 Sources of geotagged images Photo community sites (flickr, instagram, …) + Never-stop growing - Noisy images/tags, concentrates landmarks StreetView images (Google StreetView, Mapillary,…) + Accurate, covering almost all the streets - (Can) not update frequently Generate perspective cutouts [Gronat11, Chen11, Torii13]

10 Sources of geotagged images San Francisco Landmarks dataset [Chen11] • 1.06M images for 6 x 6 km 2 Spatial and temporal densities may increase by collecting all the data from autonomous drivings! However, it is impossible to monitor all the streets for all the time. Figures from: [Chen-CVPR11]

11 Temporal sparseness induces … Query images Time 2014/07 2014/10/08 Lighting (day-night) Structure (poster) Database image

12 Spatial sparseness induces … Viewpoints Occlusions Space Time

13 It is actually the mixture of … Lightings, Structures, Viewpoints, Occlusions Space Time

14 Why is this di ffi cult? Temporal gap - Lighting, weather, season, structures, moving objects Spatial gap - Viewpoints, self-occlusions Large scale - Inter/intra repetitions, saturations of features

15 Local feature based image representation Feature-based Place Recognition

16 Visual instance recognition Geotagged image database Design an “image representation” extractor f(I) f( ) f( ) f( ) + + Transfer GPS + f( ) f( ) + + + f( ) + Query f( ) Image representation space

17 Review: Visual instance recognition 2 0 0 + 1 0 1 … Image I Extract local features Aggregate f(I) Compact yet discriminative image representation f(I), i.e. BoW, VLAD, FV

18 Review: Bag of Words (BoW) Local feature detection & description (DoG+SIFT) 0/1 assignment of desc. i to cluster k 0 1 0 0 [Sivic03]

19 Review: Bag of Words (BoW) Local feature detection & description (DoG+SIFT) 0/1 assignment of desc. i to cluster k Sum over all N descriptors 0 in the image 1 1 2 B = [ 1, 0, 2, 1, … ] [Sivic03]

20 Review: Vector of Locally Aggregated Descriptors (VLAD) 0/1 assignment of desc. i to cluster k Residual vector [Jégou10b]

21 Review: Vector of Locally Aggregated Descriptors (VLAD) 0/1 assignment of desc. i to cluster k Residual vector Sum over all N descriptors in the image V = [ , . , , , … ] [Jégou10b]

22 BoW VLAD (FV) B = [ 1, 0, 2, 1, … ] V = [ , . , , , … ] Dim. = #clusters   Dim. = #clusters x Dim. of feature   (e.g. 1.6M) (e.g. 256K x 128K) + Can be a sparse histogram + Performs well with a small vocab. (using a large vocab.) + Can be compressed by PCA with + Can provide matches a small loss in performance + No extra memory requirement to encode more features

23 Matching BoW histograms 0 1 1 2 Q = [ 1, 0, 2, 1, … ] B = [ 1, 0, 2, 1, … ]

26 BoW VLAD (FV) B = [ 1, 0, 2, 1, … ] V = [ , . , , , … ] Dim. = #clusters   Dim. = #clusters x Dim. of feature   (e.g. 1.6M) (e.g. 256K x 128K) + Can be a sparse histogram + Performs well with a small vocab. (using a large vocab.) + Can be compressed by PCA with + Can provide matches a small loss in performance + No extra memory requirement to encode more features

27 Sparse to dense features , . , , , … + Image I Aggregate f(I) Extract local features (DoG+SIFT)

28 Sparse to dense features , . , , , … + Image I Aggregate f(I) Extract local features (DSIFT, PHOW) + No memory overhead (with VLAD) + No bursts +/- Less invariant to viewpoint changes See [Lazebnik06, Bosch07, Iscen15, Torii15]

29 Sparse to dense features to CNN , . , , , … + Image I Aggregate f(I) Extract local features (DSIFT, PHOW) Pooling layer CNN layers Image NetVLAD layer Convolutional Neural Network (KxD)x1 soft-assignment VLAD vector conv (w,b) s L2 soft-max ... 1x1xDxK normalization x V intra- x VLAD core (c) normalization WxHxD map interpreted as NxD local descriptors x For detail, please be patient and wait for next session! See also an excellent survey paper [Zheng17]!

30 Instant level recognition to place recognition Feature-based Place Recognition

31 Designing (sparse) BoW tailored for place recognition tasks (Please forget about VLAD for 5 min) Feature-based Place Recognition

32 Using advanced techniques • Burstiness weighting [Jegou09] • Soft/multiple assignment [Philbin08, Chen11] • Hamming embedding [Jegou10, Arandjelovic14, Sattler16] • Query expansion [Chum11, Arandjelovic12] • Spatial/Hough pyramid, WGC, [Lazebnik06, Jegou10, Tolias14] Figure from [Zheng16]

33 Using advanced techniques • Burstiness weighting [Jegou09] • Soft/multiple assignment [Philbin08, Chen11] • Hamming embedding [Jegou10, Arandjelovic14, Sattler16] • Query expansion [Chum11, Arandjelovic12] • Spatial/Hough pyramid, WGC, [Lazebnik06, Jegou10, Tolias14] The challenges in large-scale place recognition Temporal • - Lighting, weather, season, structures, moving objects Spatial • - Viewpoints, self-occlusions Large scale • - Saturations (repetitions)

35 Burstiness weighting [Jegou09] Suppress saturation of BoW by repetitive patterns Query Top 3 ranked images The score is dominated by VWs on repeated structures But, removing them loses too much information

36 Soft/multiple assignment [Philbin08, Chen11] Retrieve matches lost by quantization Database image (correct) Query image Soft weight

37 Adaptive assignment [Torii13] 1.Explicitly detect repeated structures 2.Design an adaptive soft-assignment procedure - Repetitions provide a natural soft-assignment 3.Truncate high weights to limit influence of repeated VWs

38 Hamming embedding [Jegou10] Subdivide each cell into finer blocks • Give additional binary signature • Database image 10 (correct) 00 11 Query image 01 01 00

39 Dislocation [Arandjelovic14] Distinctiveness = inverse of local density = distance to the k-th nearest descriptor (HE signature) = σ See also Disloc+geometric burstiness [Sattler16] and selective match kernel [Tolias16]

40 Using GPS tags as weak priors Feature-based Place Recognition

41 Spatially far images should not match 200m positive image many negative data points [Schindler07] – What are informative features? [Zamir10] – Ratio test with location constraint. [Knopp10, Gronat13, Cao13, Sattler16 …. ]

42 Detecting “confusing” image regions [Knopp10] Key idea: Spatially far images should not match. Find the most similar images that are spatially far

43 Detecting “confusing” image regions [Knopp10] Find image areas with high density of local matches Matches with confused images Confusion score Confuising regions

44 Learning per-place linear SVM [Gronat-CVPR13] Key idea: Spatially far images should not match. 200m Objective function: where h is squared hinge loss. See also: [Cao13] Similar to Exemplar SVM by [Malisiewicz11]

45 NetVLAD [Arandjelovic16] GPS only provides weak supervision –Given a query, GPS gives us: • Definite negatives : – geographically far from the query Query

46 Major changes in appearance Feature-based Place Recognition

47 Changes across time, weather, season Figure from [Maddern14] Figure from [Neubert15] More studied in robot vision community, e.g. Generating illumination invariant images [Maddern14] - Leaning from repeated recordings [Neubert15] -

48 Place recognition under large changes in appearance & illumination [Torii15] A very challenging dataset that contains major changes • in illumination as well as structural changes. Day Sunset Night Database image Query images

Feature-based Place Recognition Akihiko Torii Tokyo Tech CVPR 2017 - PowerPoint PPT Presentation

1 Feature-based Place Recognition Akihiko Torii Tokyo Tech CVPR 2017 tutorial on Large-Scale Visual Place Recognition and Image-Based Localization Alex Kendall, Torsten Sattler, Giorgos Tolias, Akihiko Torii 2 Introduction Challenges in

A PLACE TO CALL HOME A PLACE TO CALL HOME A PLACE TO CALL HOME A PLACE TO CALL HOME A PLACE

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Automated Feature Extraction Automated Feature Extraction for Object Recognition for Object

The Place Approach What is the Place Approach? What makes a Great Place The Benefits of a Great

Feature Selection Pattern Recognition: The Early Days Pattern Recognition: The Early Days Only

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Early Face Recognition Systems in Computer Vision Kanade feature-based face recognition (1973!)

Leading Causes of Death Where do you think heart disease falls? 1st place 2nd place

Feature Point Feature-based approach: Detect and match feature Detec.on and Matching points

EMPLOYEE RECOGNITION OBJECTIVES Types of recognition Creating a culture of recognition

License Plate Recognition License Plate Recognition License Plate Recognition License Plate

rt t r st r

(Pre-)Algebras for Linguistics 3. Trees Carl Pollard Linguistics 680: Formal Foundations

Modelling subglacial drainage and its role in ice-ocean interaction Ian Hewitt (University of

Geothermal Energy Geothermal Energy for heating in Europe for heating in Europe - status and

Introduction to Trees Carl Pollard Department of Linguistics Ohio State University November 1,

Grammar Implementation with Lexicalized Tree Adjoining Grammars and Frame Semantics Introduction

Reference Resolution and other Discourse phenomena 11-711 Algorithms for NLP November 2020 What

IT350: Web & Internet Programming Fall 2015 Set 4: CSS No Style Style! How do we get from

Feature-based Place Recognition Akihiko Torii Tokyo Tech CVPR 2017 - PowerPoint PPT Presentation

1 Feature-based Place Recognition Akihiko Torii Tokyo Tech CVPR 2017 tutorial on Large-Scale Visual Place Recognition and Image-Based Localization Alex Kendall, Torsten Sattler, Giorgos Tolias, Akihiko Torii 2 Introduction Challenges in

A PLACE TO CALL HOME A PLACE TO CALL HOME A PLACE TO CALL HOME A PLACE TO CALL HOME A PLACE

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Automated Feature Extraction Automated Feature Extraction for Object Recognition for Object

The Place Approach What is the Place Approach? What makes a Great Place The Benefits of a Great

Feature Selection Pattern Recognition: The Early Days Pattern Recognition: The Early Days Only

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Early Face Recognition Systems in Computer Vision Kanade feature-based face recognition (1973!)

Leading Causes of Death Where do you think heart disease falls? 1st place 2nd place

Feature Point Feature-based approach: Detect and match feature Detec.on and Matching points

EMPLOYEE RECOGNITION OBJECTIVES Types of recognition Creating a culture of recognition

License Plate Recognition License Plate Recognition License Plate Recognition License Plate

rt t r st r

(Pre-)Algebras for Linguistics 3. Trees Carl Pollard Linguistics 680: Formal Foundations

Modelling subglacial drainage and its role in ice-ocean interaction Ian Hewitt (University of

Geothermal Energy Geothermal Energy for heating in Europe for heating in Europe - status and

Introduction to Trees Carl Pollard Department of Linguistics Ohio State University November 1,

Grammar Implementation with Lexicalized Tree Adjoining Grammars and Frame Semantics Introduction

Reference Resolution and other Discourse phenomena 11-711 Algorithms for NLP November 2020 What

IT350: Web &amp; Internet Programming Fall 2015 Set 4: CSS No Style Style! How do we get from

IT350: Web & Internet Programming Fall 2015 Set 4: CSS No Style Style! How do we get from