Feature-based Place Recognition Akihiko Torii Tokyo Tech CVPR 2017 - - PowerPoint PPT Presentation

feature based place recognition
SMART_READER_LITE
LIVE PREVIEW

Feature-based Place Recognition Akihiko Torii Tokyo Tech CVPR 2017 - - PowerPoint PPT Presentation

1 Feature-based Place Recognition Akihiko Torii Tokyo Tech CVPR 2017 tutorial on Large-Scale Visual Place Recognition and Image-Based Localization Alex Kendall, Torsten Sattler, Giorgos Tolias, Akihiko Torii 2 Introduction Challenges in


slide-1
SLIDE 1

Feature-based Place Recognition

1

Akihiko Torii Tokyo Tech

CVPR 2017 tutorial on Large-Scale Visual Place Recognition and Image-Based Localization Alex Kendall, Torsten Sattler, Giorgos Tolias, Akihiko Torii

slide-2
SLIDE 2
  • Introduction
  • Challenges in large-scale place recognition
  • Local feature based image representation
  • Instant level recognition to place recognition
  • Datasets and evaluation protocol

2

Feature-based Place Recognition

slide-3
SLIDE 3

Introduction

3

Feature-based Place Recognition

slide-4
SLIDE 4

4

Search box

Visual Place Recognition

Where?

slide-5
SLIDE 5

The approach

  • Represent the world by a set of geotagged images
  • Given a query image, find the best matching image
  • Transfer the geotag of the best matching image

5

Query

https://www.google.co.jp/maps/ @35.6066354,139.6861582,3a,45.2y,256.68h,96.58t

slide-6
SLIDE 6

Why is this interesting?

  • Mapping/organizing any photos
  • n the globe
  • Recognition & geometry

6

slide-7
SLIDE 7

(Visual) place recognition [Knopp10,Torii13, Maddern14, Johns14, Sunderhauf15] Location recognition [Cao13, Arandjelovic14, Sattler15] Landmark identification [Chen11] Geo-localization [Hays08, Cummins08, Zamir10, Zamir16, Kim17]

7

slide-8
SLIDE 8

Challenges in large-scale 
 place recognition

8

Feature-based Place Recognition

slide-9
SLIDE 9

Photo community sites (flickr, instagram, …) + Never-stop growing

  • Noisy images/tags, concentrates landmarks

StreetView images (Google StreetView, Mapillary,…) + Accurate, covering almost all the streets

  • (Can) not update frequently

9

Sources of geotagged images

Generate perspective cutouts [Gronat11, Chen11, Torii13]

slide-10
SLIDE 10

10

San Francisco Landmarks dataset [Chen11]

  • 1.06M images for 6 x 6 km2

Figures from: [Chen-CVPR11]

Sources of geotagged images

Spatial and temporal densities may increase by collecting all the data from autonomous drivings! However, it is impossible to monitor all the streets for all the time.

slide-11
SLIDE 11

Temporal sparseness induces …

Query images

11

Database image Time 2014/10/08 2014/07 Lighting (day-night) Structure (poster)

slide-12
SLIDE 12

12

Space Time Viewpoints Occlusions

Spatial sparseness induces …

slide-13
SLIDE 13

13

Time Space

It is actually the mixture of …

Lightings, Structures, Viewpoints, Occlusions

slide-14
SLIDE 14

Temporal gap

  • Lighting, weather, season, structures, moving objects

Spatial gap

  • Viewpoints, self-occlusions

Large scale

  • Inter/intra repetitions, saturations of features

14

Why is this difficult?

slide-15
SLIDE 15

Local feature based image representation

15

Feature-based Place Recognition

slide-16
SLIDE 16

16

Image representation space

f( ) f( ) f( ) f( ) f( ) f( ) + + + + + + f( ) + Query

Transfer GPS

Visual instance recognition

Geotagged image database

Design an “image representation” extractor f(I)

slide-17
SLIDE 17

17

Extract local features

2 1 1 …

f(I)

+

Aggregate Image I

Review: Visual instance recognition

Compact yet discriminative image representation f(I), i.e. BoW, VLAD, FV

slide-18
SLIDE 18

18

[Sivic03]

1

Review: Bag of Words (BoW)

0/1 assignment of desc. i to cluster k Local feature detection & description (DoG+SIFT)

slide-19
SLIDE 19

19

Sum over all N descriptors in the image

1 2 1 B = [ 1, 0, 2, 1, … ]

0/1 assignment of desc. i to cluster k Local feature detection & description (DoG+SIFT)

Review: Bag of Words (BoW)

[Sivic03]

slide-20
SLIDE 20

20

Review: Vector of Locally Aggregated Descriptors (VLAD)

Residual vector

0/1 assignment of desc. i to cluster k

[Jégou10b]

slide-21
SLIDE 21

21

V = [ , . , , , … ]

Residual vector

0/1 assignment of desc. i to cluster k

Review: Vector of Locally Aggregated Descriptors (VLAD)

Sum over all N descriptors in the image

[Jégou10b]

slide-22
SLIDE 22

22

BoW VLAD (FV)

V = [ , . , , , … ] B = [ 1, 0, 2, 1, … ]

  • Dim. = #clusters 


(e.g. 1.6M)

+ Can be a sparse histogram

(using a large vocab.)

+ Can provide matches

  • Dim. = #clusters x Dim. of feature


(e.g. 256K x 128K)

+ Performs well with a small vocab. + Can be compressed by PCA with

a small loss in performance

+ No extra memory requirement to

encode more features

slide-23
SLIDE 23

23

1 2 1 B = [ 1, 0, 2, 1, … ]

Matching BoW histograms

Q = [ 1, 0, 2, 1, … ]

slide-24
SLIDE 24

24

1 2 1 B = [ 1, 0, 2, 1, … ]

Matching BoW histograms

Q = [ 1, 0, 2, 1, … ]

slide-25
SLIDE 25

25

1 2 1 B = [ 1, 0, 2, 1, … ]

Matching BoW histograms

Q = [ 1, 0, 2, 1, … ]

slide-26
SLIDE 26
  • Dim. = #clusters 


(e.g. 1.6M)

+ Can be a sparse histogram

(using a large vocab.)

+ Can provide matches

26

  • Dim. = #clusters x Dim. of feature


(e.g. 256K x 128K)

+ Performs well with a small vocab. + Can be compressed by PCA with

a small loss in performance

+ No extra memory requirement to

encode more features

BoW VLAD (FV)

V = [ , . , , , … ] B = [ 1, 0, 2, 1, … ]

slide-27
SLIDE 27

27

Extract local features (DoG+SIFT) f(I)

+

Aggregate Image I

Sparse to dense features

, . , , , …

slide-28
SLIDE 28

Extract local features (DSIFT, PHOW)

28

f(I)

+

Aggregate Image I

Sparse to dense features

+ No memory overhead (with VLAD) + No bursts +/- Less invariant to viewpoint changes

See [Lazebnik06, Bosch07, Iscen15, Torii15]

, . , , , …

slide-29
SLIDE 29

Extract local features (DSIFT, PHOW)

29

f(I)

+

Aggregate Image I

conv (w,b) 1x1xDxK soft-max VLAD core (c) intra- normalization L2 normalization

soft-assignment

V x x s (KxD)x1 VLAD vector

NetVLAD layer Convolutional Neural Network

...

Image

WxHxD map interpreted as NxD local descriptors x

CNN layers Pooling layer

For detail, please be patient and wait for next session!

Sparse to dense features to CNN

See also an excellent survey paper [Zheng17]!

, . , , , …

slide-30
SLIDE 30

Instant level recognition to place recognition

30

Feature-based Place Recognition

slide-31
SLIDE 31

Designing (sparse) BoW tailored for place recognition tasks (Please forget about VLAD for 5 min)

31

Feature-based Place Recognition

slide-32
SLIDE 32

Using advanced techniques

32

Figure from [Zheng16]

  • Burstiness weighting [Jegou09]
  • Soft/multiple assignment [Philbin08, Chen11]
  • Hamming embedding [Jegou10, Arandjelovic14, Sattler16]
  • Query expansion [Chum11, Arandjelovic12]
  • Spatial/Hough pyramid, WGC, [Lazebnik06, Jegou10, Tolias14]
slide-33
SLIDE 33

Using advanced techniques

  • Burstiness weighting [Jegou09]
  • Soft/multiple assignment [Philbin08, Chen11]
  • Hamming embedding [Jegou10, Arandjelovic14, Sattler16]
  • Query expansion [Chum11, Arandjelovic12]
  • Spatial/Hough pyramid, WGC, [Lazebnik06, Jegou10, Tolias14]

33

The challenges in large-scale place recognition

  • Temporal
  • Lighting, weather, season, structures, moving objects
  • Spatial
  • Viewpoints, self-occlusions
  • Large scale
  • Saturations (repetitions)
slide-34
SLIDE 34

34

slide-35
SLIDE 35

35

The score is dominated by VWs on repeated structures But, removing them loses too much information

Burstiness weighting [Jegou09]

Query Top 3 ranked images Suppress saturation of BoW by repetitive patterns

slide-36
SLIDE 36

36

Soft weight

Query image Database image (correct)

Soft/multiple assignment [Philbin08, Chen11]

Retrieve matches lost by quantization

slide-37
SLIDE 37

Adaptive assignment [Torii13]

1.Explicitly detect repeated structures 2.Design an adaptive soft-assignment procedure

  • Repetitions provide a natural soft-assignment

3.Truncate high weights to limit influence of repeated VWs

37

slide-38
SLIDE 38

38

  • Subdivide each cell into finer blocks
  • Give additional binary signature

00 11 01 10

Hamming embedding [Jegou10]

01

Query image Database image (correct)

00

slide-39
SLIDE 39

39

Distinctiveness = inverse of local density = distance to the k-th nearest descriptor (HE signature) = σ

Dislocation [Arandjelovic14]

See also Disloc+geometric burstiness [Sattler16] and selective match kernel [Tolias16]

slide-40
SLIDE 40

Using GPS tags as weak priors

40

Feature-based Place Recognition

slide-41
SLIDE 41

Spatially far images should not match

41

positive image many negative data points

200m

[Schindler07] – What are informative features? [Zamir10] – Ratio test with location constraint.

[Knopp10, Gronat13, Cao13, Sattler16 …. ]

slide-42
SLIDE 42

42

Key idea: Spatially far images should not match.

Find the most similar images that are spatially far

Detecting “confusing” image regions [Knopp10]

slide-43
SLIDE 43

43 Matches with confused images Confusion score Confuising regions

Find image areas with high density of local matches

Detecting “confusing” image regions [Knopp10]

slide-44
SLIDE 44

44

200m

Objective function: where h is squared hinge loss.

Similar to Exemplar SVM by [Malisiewicz11]

See also: [Cao13]

Learning per-place linear SVM [Gronat-CVPR13]

Key idea: Spatially far images should not match.

slide-45
SLIDE 45

45

NetVLAD [Arandjelovic16]

GPS only provides weak supervision

–Given a query, GPS gives us:

  • Definite negatives :

– geographically far from the query

Query

slide-46
SLIDE 46

Major changes in appearance

46

Feature-based Place Recognition

slide-47
SLIDE 47

Changes across time, weather, season

47

Figure from [Maddern14] Figure from [Neubert15]

More studied in robot vision community, e.g.

  • Generating illumination invariant images [Maddern14]
  • Leaning from repeated recordings [Neubert15]
slide-48
SLIDE 48

Place recognition under large changes in appearance & illumination [Torii15]

48

Query images Day Sunset Night

  • A very challenging dataset that contains major changes

in illumination as well as structural changes.

Database image

slide-49
SLIDE 49

49

Main observation ✓Dense SIFT can cope with large changes 
 in appearance & illumination
 ! but only when there is no large view point change

Place recognition under large changes in appearance & illumination [Torii15]

slide-50
SLIDE 50

Experiment: matching dense/sparse descriptors 


across small/large viewpoint changes

50

Large vp change Sparse

Query image Street-view Sparse SIFT (DoG) Inlier ratio: 0.05 (53/1149)

slide-51
SLIDE 51

51

Large vp change Sparse Dense

Query image Street-view Sparse SIFT (DoG) Inlier ratio: 0.05 (53/1149) Dense SIFT (DoG) Inlier ratio: 0.31 (1135/3708)

Experiment: matching dense/sparse descriptors 


across small/large viewpoint changes

slide-52
SLIDE 52

52

Large vp change Small vp change Sparse Dense

Query image Street-view Query image Synthesized view Sparse SIFT (DoG) Inlier ratio: 0.05 (53/1149) Sparse SIFT (DoG) Inlier ratio: 0.12 (122/984) Dense SIFT (DoG) Inlier ratio: 0.31 (1135/3708)

Experiment: matching dense/sparse descriptors 


across small/large viewpoint changes

slide-53
SLIDE 53

53

Large vp change Small vp change Sparse Dense

Query image Street-view Query image Synthesized view Sparse SIFT (DoG) Inlier ratio: 0.05 (53/1149) Sparse SIFT (DoG) Inlier ratio: 0.12 (122/984) Dense SIFT (DoG) Inlier ratio: 0.31 (1135/3708) Dense SIFT Inlier ratio: 0.76 (5410/7138)

Experiment: matching dense/sparse descriptors 


across small/large viewpoint changes

slide-54
SLIDE 54

54

We generate synthesized images at new positions using depth maps associated to street-view images

Street-view panorama Associated depth-map Individual scene planes Examples of synthesized views

Virtual view Street-view Query

Place recognition under large changes in appearance & illumination [Torii15]

slide-55
SLIDE 55

Qualitative results: dense-synth vs sparse-strt-viw

We seek to find one or more images depicting the same place!

55

slide-56
SLIDE 56

Qualitative results: dense-synth vs dense-strt-viw

We seek to find one or more images depicting the same place!

56

slide-57
SLIDE 57

Datasets and evaluation protocol

57

Feature-based Place Recognition

slide-58
SLIDE 58

Available datasets

Datasets Database images

Database 3D points

#Query Ground truth Pittsburgh [Torii13] 254K 24K GPS Tokyo [Torii15] 76K (374K) 1,125 GPS

Oxford 5K (+100K) 55 Label Paris 6.4K 55 Label

San Francisco PCI [Chen11] 1.06M 803 Building ID (label) San Francisco SF-0 [Li12] 610K

30M

803 Building ID (label) San Francisco SF-1 [Li12] 790K

75M

803 Building ID (label)

slide-59
SLIDE 59

Available datasets

Datasets Database images

Database 3D points

#Query Ground truth Arts Quad [Li10] 6.5K

2M

348 Differentiable GPS Aachen [Sattler12] 3K

1.5M

369 (10.6K seq.) Camera pose Baidu-IBL [Sun17] 682

67M

2296 LiDAR reg. Landmarks [Li12] 205K

38M

10K SfM Dubrovnik [Li12] 6K

1.9M

800 SfM Cambridge Landmarks [Kendall15] 6.8K

RGBD

4K Camera poses (GPS) 7 scenes [Shotton13] 26K

RGBD

17K GPS

slide-60
SLIDE 60

60

Evaluation protocol: Recall

Query image

…….

ID: 1129, 1133 ID: 1129 ID: 24, 389… ID: 670

San Francisco PCIs [Chen11]: Ground truth - Building ID (label)

Recall = Number of query images

Number of query images at least one of the top N retrieved database images has the ground truth building ID match

Correctly localized, Incorrect,

{

the building IDs match

  • therwise

Top N ranked database images

slide-61
SLIDE 61

San Francisco PCIs [Chen11]: Ground truth - Building ID (label)

61

Evaluation protocol: Recall

Recall = Number of query images

Number of query images at least one of the top N retrieved database images has the ground truth building ID match

Correctly localized, Incorrect,

{

the building IDs match

  • therwise
slide-62
SLIDE 62

Pittsburgh [Torii13], Tokyo [Torii15]: Ground truth - GPS

62

Evaluation protocol: Recall

Recall = Number of query images

Number of query images at least one of the top N retrieved database images has the ground truth label

d = distance(Query_GPS, Database_GPS@N) Correctly localized, Incorrect,

{

if d < distance threshold, e.g. 25m

  • therwise

Query image Top N ranked database images

…….

(35.66,139.65) (35.66,139.64) (35.70,139.60) (35.50,139.70)

slide-63
SLIDE 63

Pittsburgh [Torii13], Tokyo [Torii15]: Ground truth - GPS

63

Evaluation protocol: Recall

Recall = Number of query images

Number of query images at least one of the top N retrieved database images has the ground truth label

d = distance(Query_GPS, Database_GPS@N) Correctly localized, Incorrect,

{

if d < distance threshold, e.g. 25m

  • therwise
slide-64
SLIDE 64

64

Evaluation protocol: Localization rate

Localization = Number of query images

Number of query images localized within the threshold (x-axis)

d = distance(Query_GPS, Database_GPS@1)

Figure from [Zamir10]

Correctly localized, Incorrect,

{

if d < distance threshold (x-axis)

  • therwise
slide-65
SLIDE 65

2D vs. 3D?

65 Large-scale 3D model Geotagged images Query image ? ?

slide-66
SLIDE 66

66

Are Large-Scale 3D Models Really Necessary? [Sattler17]

  • Constructing & maintaining image database very easy
  • Small memory footprint via compact image descriptors (≤16KB per image)
  • Approximate pose: Image retrieval and known poses of database images
  • Accurate pose estimation via post-processing: Local SfM+geo-registration

Query image

Retrieved geo-tagged images

+

Local 3D model

Geo-Registration

Local SfM

slide-67
SLIDE 67

67

http://www.ok.sc.e.titech.ac.jp/~torii/project/vlocalization/


Results | Reference Poses | Benchmark Protocol

Reference Poses for San Francisco

Query Most relevant DB imageManual annotation:

  • a. 2D-2D correspondences (green)
  • b. 2D-3D correspondences (red)

Pose estimation:

  • 1. Local SfM (query + db. images)
  • 2. Geo-registration using GPS of DB

Query Most relevant DB “Reference poses” consistent with manual annotations Manual selection

slide-68
SLIDE 68

68

Experiments

5 10 15 20 25 30

Distance threshold [meters]

20 40 60 80 100

Correctly localized queries [%]

Disloc (NN) Disloc (SR) Disloc (SR-SfM)

5 10 15 20 25 30

Distance threshold [meters]

20 40 60 80 100

Correctly localized queries [%]

DenseVLAD (NN) DenseVLAD (SR) DenseVLAD (SR-SfM) NetVLAD (NN) NetVLAD (SR) NetVLAD (SR-SfM)

5 10 15 20 25 30

Distance threshold [meters]

20 40 60 80 100

Correctly localized queries [%]

DenseVLAD (SR-SfM) Disloc (SR-SfM) Hyperpoints (3D) CPV w/ GPS (3D) CPV w/o GPS (3D)

  • 2D retrieval-based: NetVLAD, Disloc + geom. burstiness, DenseVLAD
  • 3D-based: Hyperpoints, Active Search, Camera Pose Voting (CPV)
  • Variants for 2D: Nearest neighbor (NN), spatial verification (SR), local SfM (SfM)
  • Evaluation measure: Percentage of query images with pose within X meters of

reference pose, distances measured in UTM coordinates in 2D (height undefined)

  • Results for all San Francisco references poses:
slide-69
SLIDE 69

Concluding remarks

  • It is interesting to understand classic feature-based approaches as

many CNN based representations are build on top of them, e.g. [Tolias16, Arandjelovic16, Radenovic16, Kim17]

  • Future works includes
  • Efficient and effective image format/projection model [Torii14,

Iscen17, Wijmans17]

  • Large-scale indoor localization [Wang15, Zhu16, Wijmans17]
  • Using semantics (segmentations) [Arandjelovic14b, Armagan17]

69

slide-70
SLIDE 70

References

[Arandjelovic12] R. Arandjelovic and A. Zisserman. Three things everyone should know to improve object retrieval. In Proc. CVPR 2012. [Arandjelovic14] R. Arandjelovic and A. Zisserman. DisLocation: Scalable descriptor distinctiveness for location

  • recognition. In Proc. ACCV 2014.

[Bosch07] A. Bosch, A. Zisserman, and X. Munoz. Image classification using random forests and ferns. In Proc. ICCV 2007. [Cao13] S. Cao and N. Snavely. Graph-based discriminative learning for location recognition. In Proc. CVPR 2013. [Chen11] D. M. Chen, G. Baatz, K. Koeser, S. S. Tsai, R. Vedantham, T. Pylvanainen, K. Roimela, X. Chen, J. Bach,

  • M. Pollefeys, B. Girod, and R. Grzeszczuk. City-scale landmark identification on mobile devices. In Proc. CVPR

2011. [Chum11] O. Chum, A. Mikulik, M. Per{\v d]och, and J. Matas. Total recall {II]: Query expansion revisited. In Proc. CVPR 2011. [Cummins08] M. Cummins and P . Newman. FAB-MAP: Probabilistic localization and mapping in the space of

  • appearance. The International Journal of Robotics Research 2008.

[Gronat13] P . Gronat, G. Obozinski, J. Sivic, and T. Pajdla. Learning and calibrating per-location classifiers for visual place recognition. In Proc. CVPR 2013. [Hays08] J. Hays and A. Efros. im2gps: estimating geographic information from a single image. In Proc. CVPR 2008. [Iscen17] A. Iscen, G. Tolias, Y. S. Avrithis, T. Furon, and O. Chum. Panorama to panorama matching for location

  • recognition. CoRR abs/1704.06591, 2017.

[Iscen15] A. Iscen, G. Tolias, P . H. Gosselin, and H. Jegou. A comparison of dense region detectors for image search and fine-grained classification. IEEE Transactions on Image Processing 24(8):2369--2381, 2015. [Jegou09] H. Jegou, M. Douze, and C. Schmid. On the burstiness of visual elements. In Proc. CVPR 2009. [Jegou10] H. Jegou, M. Douze, and C. Schmid. Improving bag-of-features for large scale image search. IJCV 87(3): 316--336, 2010.

70

slide-71
SLIDE 71

References

[Jegou10b] H. Jegou, M. Douze, C. Schmid, and P . Perez. Aggregating local descriptors into a compact image

  • representation. In Proc. CVPR 2010.

[Kim17] H. Jin Kim, E. Dunn, and J.-M. Frahm. Learned contextual feature reweighting for image geo-localization. In Proc. CVPR 2017. [johns14] E. Johns and G.-Z. Yang. Generative methods for long-term place recognition in dynamic scenes. International Journal of Computer Vision 106(3):297--314, 2014. [Knopp10] J. Knopp, J. Sivic, and T. Pajdla. Avoiding confusing features in place recognition. In Proc. ECCV 2010. [Lazebnik06] S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In Proc. CVPR 2006. [Maddern14] W. Maddern, A. Stewart, C. McManus, B. Upcroft, W. Churchill, and P . Newman. Illumination invariant imaging: Applications in robust vision-based localisation, mapping and classification for autonomous vehicles. In Proceedings of the Visual Place Recognition in Changing Environments Workshop, IEEE International Conference

  • n Robotics and Automation (ICRA) 2014.

[Neubert15] P . Neubert, N. Sunderhauf, and P . Protzel. Superpixel-based appearance change prediction for long- term navigation across seasons. Robotics and Autonomous Systems 69:15--27, 2015. [Philbin08] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular

  • bject retrieval in large scale image databases. In Proc. CVPR 2008.

[Sattler15]

  • T. Sattler, M. Havlena, F

. Radenovic, K. Schindler, and M. Pollefeys. Hyperpoints and fine vocabularies for large- scale location recognition. In Proc. ICCV 2015. [Sattler16] T. Sattler, M. Havlena, K. Schindler, and M. Pollefeys. Large-scale location recognition and the geometric burstiness problem. In Proc. CVPR 2016. [Sattler17] T. Sattler, A. Torii, J. Sivic, M. Pollefeys, H. Taira, M. Okutomi, and T. Pajdla. Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization? In Proc. CVPR 2017. [Sivic03] J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In Proc. ICCV 2003.

71

slide-72
SLIDE 72

References

[Sunderhauf-RSS15] N. Sunderhauf, S. Shirazi, A. Jacobson, F . Dayoub, E. Pepperell, B. Upcroft, and M. Milford. Place recognition with convnet landmarks: Viewpoint-robust, condition-robust, training-free. In Robotics: Science and Systems 2015. [Tolias14] G. Tolias and H. Jegou. Visual query expansion with or without geometry: refining local descriptors by feature aggregation. Pattern Recognition 2014. [Torii15] A. Torii, R. Arandjelovic, J. Sivic, M. Okutomi, and T. Pajdla. 24/7 place recognition by view synthesis. In

  • Proc. CVPR 2015.

[Torii13] A. Torii, J. Sivic, T. Pajdla, and M. Okutomi. Visual place recognition with repetitive structures. In Proc. CVPR 2013. [Zamir10] A. Zamir and M. Shah. Accurate image localization based on google maps street view. In Proc. ECCV 2010. [Zamir16] A. R. Zamir, A. Hakeem, L. Van Gool, M. Shah, and R. Szeliski. Large-scale visual geo-localization]. Springer, 2016. [Zheng16] L. Zheng, Y. Yang, and Q. Tian. SIFT meets CNN: A decade survey of instance retrieval. CoRR abs/ 1608.01807, 2016.

72