feature based place recognition
play

Feature-based Place Recognition Akihiko Torii Tokyo Tech CVPR 2017 - PowerPoint PPT Presentation

1 Feature-based Place Recognition Akihiko Torii Tokyo Tech CVPR 2017 tutorial on Large-Scale Visual Place Recognition and Image-Based Localization Alex Kendall, Torsten Sattler, Giorgos Tolias, Akihiko Torii 2 Introduction Challenges in


  1. 1 Feature-based Place Recognition Akihiko Torii Tokyo Tech CVPR 2017 tutorial on Large-Scale Visual Place Recognition and Image-Based Localization Alex Kendall, Torsten Sattler, Giorgos Tolias, Akihiko Torii

  2. 2 Introduction • Challenges in large-scale place recognition • Local feature based image representation • Instant level recognition to place recognition • Datasets and evaluation protocol • Feature-based Place Recognition

  3. 3 Introduction Feature-based Place Recognition

  4. 4 Visual Place Recognition Where? Search box

  5. 5 The approach Represent the world by a set of geotagged images • Given a query image, find the best matching image • Transfer the geotag of the best matching image • Query https://www.google.co.jp/maps/ @35.6066354,139.6861582,3a,45.2y,256.68h,96.58t

  6. 6 Why is this interesting? Mapping/organizing any photos • on the globe Recognition & geometry •

  7. 7 (Visual) place recognition [Knopp10,Torii13, Maddern14, Johns14, Sunderhauf15] Location recognition [Cao13, Arandjelovic14, Sattler15] Landmark identification [Chen11] Geo-localization [Hays08, Cummins08, Zamir10, Zamir16, Kim17]

  8. 8 Challenges in large-scale 
 place recognition Feature-based Place Recognition

  9. 9 Sources of geotagged images Photo community sites (flickr, instagram, …) + Never-stop growing - Noisy images/tags, concentrates landmarks StreetView images (Google StreetView, Mapillary,…) + Accurate, covering almost all the streets - (Can) not update frequently Generate perspective cutouts [Gronat11, Chen11, Torii13]

  10. 10 Sources of geotagged images San Francisco Landmarks dataset [Chen11] • 1.06M images for 6 x 6 km 2 Spatial and temporal densities may increase by collecting all the data from autonomous drivings! However, it is impossible to monitor all the streets for all the time. Figures from: [Chen-CVPR11]

  11. 11 Temporal sparseness induces … Query images Time 2014/07 2014/10/08 Lighting (day-night) Structure (poster) Database image

  12. 12 Spatial sparseness induces … Viewpoints Occlusions Space Time

  13. 13 It is actually the mixture of … Lightings, Structures, Viewpoints, Occlusions Space Time

  14. 14 Why is this di ffi cult? Temporal gap - Lighting, weather, season, structures, moving objects Spatial gap - Viewpoints, self-occlusions Large scale - Inter/intra repetitions, saturations of features

  15. 15 Local feature based image representation Feature-based Place Recognition

  16. 16 Visual instance recognition Geotagged image database Design an “image representation” extractor f(I) f( ) f( ) f( ) + + Transfer GPS + f( ) f( ) + + + f( ) + Query f( ) Image representation space

  17. 17 Review: Visual instance recognition 2 0 0 + 1 0 1 … Image I Extract local features Aggregate f(I) Compact yet discriminative image representation f(I), i.e. BoW, VLAD, FV

  18. 18 Review: Bag of Words (BoW) Local feature detection & description (DoG+SIFT) 0/1 assignment of desc. i to cluster k 0 1 0 0 [Sivic03]

  19. 19 Review: Bag of Words (BoW) Local feature detection & description (DoG+SIFT) 0/1 assignment of desc. i to cluster k Sum over all N descriptors 0 in the image 1 1 2 B = [ 1, 0, 2, 1, … ] [Sivic03]

  20. 20 Review: Vector of Locally Aggregated Descriptors (VLAD) 0/1 assignment of desc. i to cluster k Residual vector [Jégou10b]

  21. 21 Review: Vector of Locally Aggregated Descriptors (VLAD) 0/1 assignment of desc. i to cluster k Residual vector Sum over all N descriptors in the image V = [ , . , , , … ] [Jégou10b]

  22. 22 BoW VLAD (FV) B = [ 1, 0, 2, 1, … ] V = [ , . , , , … ] Dim. = #clusters 
 Dim. = #clusters x Dim. of feature 
 (e.g. 1.6M) (e.g. 256K x 128K) + Can be a sparse histogram + Performs well with a small vocab. (using a large vocab.) + Can be compressed by PCA with + Can provide matches a small loss in performance + No extra memory requirement to encode more features

  23. 23 Matching BoW histograms 0 1 1 2 Q = [ 1, 0, 2, 1, … ] B = [ 1, 0, 2, 1, … ]

  24. 24 Matching BoW histograms 0 1 1 2 Q = [ 1, 0, 2, 1, … ] B = [ 1, 0, 2, 1, … ]

  25. 25 Matching BoW histograms 0 1 1 2 Q = [ 1, 0, 2, 1, … ] B = [ 1, 0, 2, 1, … ]

  26. 26 BoW VLAD (FV) B = [ 1, 0, 2, 1, … ] V = [ , . , , , … ] Dim. = #clusters 
 Dim. = #clusters x Dim. of feature 
 (e.g. 1.6M) (e.g. 256K x 128K) + Can be a sparse histogram + Performs well with a small vocab. (using a large vocab.) + Can be compressed by PCA with + Can provide matches a small loss in performance + No extra memory requirement to encode more features

  27. 27 Sparse to dense features , . , , , … + Image I Aggregate f(I) Extract local features (DoG+SIFT)

  28. 28 Sparse to dense features , . , , , … + Image I Aggregate f(I) Extract local features (DSIFT, PHOW) + No memory overhead (with VLAD) + No bursts +/- Less invariant to viewpoint changes See [Lazebnik06, Bosch07, Iscen15, Torii15]

  29. 29 Sparse to dense features to CNN , . , , , … + Image I Aggregate f(I) Extract local features (DSIFT, PHOW) Pooling layer CNN layers Image NetVLAD layer Convolutional Neural Network (KxD)x1 soft-assignment VLAD vector conv (w,b) s L2 soft-max ... 1x1xDxK normalization x V intra- x VLAD core (c) normalization WxHxD map interpreted as NxD local descriptors x For detail, please be patient and wait for next session! See also an excellent survey paper [Zheng17]!

  30. 30 Instant level recognition to place recognition Feature-based Place Recognition

  31. 31 Designing (sparse) BoW tailored for place recognition tasks (Please forget about VLAD for 5 min) Feature-based Place Recognition

  32. 32 Using advanced techniques • Burstiness weighting [Jegou09] • Soft/multiple assignment [Philbin08, Chen11] • Hamming embedding [Jegou10, Arandjelovic14, Sattler16] • Query expansion [Chum11, Arandjelovic12] • Spatial/Hough pyramid, WGC, [Lazebnik06, Jegou10, Tolias14] Figure from [Zheng16]

  33. 33 Using advanced techniques • Burstiness weighting [Jegou09] • Soft/multiple assignment [Philbin08, Chen11] • Hamming embedding [Jegou10, Arandjelovic14, Sattler16] • Query expansion [Chum11, Arandjelovic12] • Spatial/Hough pyramid, WGC, [Lazebnik06, Jegou10, Tolias14] The challenges in large-scale place recognition Temporal • - Lighting, weather, season, structures, moving objects Spatial • - Viewpoints, self-occlusions Large scale • - Saturations (repetitions)

  34. 34

  35. 35 Burstiness weighting [Jegou09] Suppress saturation of BoW by repetitive patterns Query Top 3 ranked images The score is dominated by VWs on repeated structures But, removing them loses too much information

  36. 36 Soft/multiple assignment [Philbin08, Chen11] Retrieve matches lost by quantization Database image (correct) Query image Soft weight

  37. 37 Adaptive assignment [Torii13] 1.Explicitly detect repeated structures 2.Design an adaptive soft-assignment procedure - Repetitions provide a natural soft-assignment 3.Truncate high weights to limit influence of repeated VWs

  38. 38 Hamming embedding [Jegou10] Subdivide each cell into finer blocks • Give additional binary signature • Database image 10 (correct) 00 11 Query image 01 01 00

  39. 39 Dislocation [Arandjelovic14] Distinctiveness = inverse of local density = distance to the k-th nearest descriptor (HE signature) = σ See also Disloc+geometric burstiness [Sattler16] and selective match kernel [Tolias16]

  40. 40 Using GPS tags as weak priors Feature-based Place Recognition

  41. 41 Spatially far images should not match 200m positive image many negative data points [Schindler07] – What are informative features? [Zamir10] – Ratio test with location constraint. [Knopp10, Gronat13, Cao13, Sattler16 …. ]

  42. 42 Detecting “confusing” image regions [Knopp10] Key idea: Spatially far images should not match. Find the most similar images that are spatially far

  43. 43 Detecting “confusing” image regions [Knopp10] Find image areas with high density of local matches Matches with confused images Confusion score Confuising regions

  44. 44 Learning per-place linear SVM [Gronat-CVPR13] Key idea: Spatially far images should not match. 200m Objective function: where h is squared hinge loss. See also: [Cao13] Similar to Exemplar SVM by [Malisiewicz11]

  45. 45 NetVLAD [Arandjelovic16] GPS only provides weak supervision –Given a query, GPS gives us: • Definite negatives : – geographically far from the query Query

  46. 46 Major changes in appearance Feature-based Place Recognition

  47. 47 Changes across time, weather, season Figure from [Maddern14] Figure from [Neubert15] More studied in robot vision community, e.g. Generating illumination invariant images [Maddern14] - Leaning from repeated recordings [Neubert15] -

  48. 48 Place recognition under large changes in appearance & illumination [Torii15] A very challenging dataset that contains major changes • in illumination as well as structural changes. Day Sunset Night Database image Query images

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend