Place recognition with instance search from hand-crafted to - - PowerPoint PPT Presentation

place recognition with instance search
SMART_READER_LITE
LIVE PREVIEW

Place recognition with instance search from hand-crafted to - - PowerPoint PPT Presentation

Place recognition with instance search from hand-crafted to learning-based methods Giorgos Tolias Tutorial on Large -Scale Visual Place Recognition and Image- Based Localization Tolias Sattler Brachmann ICCV 2019, Seoul Outline


slide-1
SLIDE 1

Place recognition with instance search

from hand-crafted to learning-based methods

Giorgos Tolias

Tutorial on “Large-Scale Visual Place Recognition and Image-Based Localization” Tolias Sattler Brachmann ICCV 2019, Seoul

slide-2
SLIDE 2

Outline

  • Place recognition with instance search
  • Benchmarks
  • Hand-crafted approaches
  • Learning-based approaches
  • Descriptor whitening
  • Privacy preserving search
slide-3
SLIDE 3

Place recognition

slide-4
SLIDE 4

Place recognition

slide-5
SLIDE 5

Place recognition

slide-6
SLIDE 6

Place recognition via instance search

  • Huge number of classes
  • each “named” location
  • each landmark
  • GPS coordinates
  • Low intra-class variability
  • Large-scale instance search
  • nearest neighbor classifiers
  • kernel density estimation
slide-7
SLIDE 7

representation of database images

Place recognition via instance search

query image query representation nearest neighbor search

slide-8
SLIDE 8

Benchmarks

slide-9
SLIDE 9

Geo-localization

  • Tokyo 24/7 [Torii et al., CVPR‘15]
  • Pittsburg dataset [Torii et al., CVPR‘13]
  • San Francisco dataset [Chen et al., CVPR’11]

GPS-based ground truth

image from Tokyo 24 / 7

slide-10
SLIDE 10

Geo-localization

GPS-based ground truth IM2GPS [Hays & Efros, CVPR’08]

images from [Hays & Efros]

slide-11
SLIDE 11

Instance retrieval (buildings, landmarks)

  • Oxford buildings [Philbin et al., CVPR’07]
  • Paris [Philbin et al., CVPR’08]
  • Oxford/Paris revisited + 1M distractors

[Radenovic et al., CVPR’18] Manually constructed ground truth

http://cmp.felk.cvut.cz/revisitop/

slide-12
SLIDE 12

Landmark recognition and retrieval

Google Landmarks Dataset Crowd-sourced ground truth

https://github.com/cvdfoundation/google-landmark

  • Recognition training set

4.1m images 200k landmarks

  • Retrieval index set

762k images (1⁄3 decrease) 101k landmarks

  • Test set

118k images about 1% depicts landmarks

slide-13
SLIDE 13

Global descriptors & voting approaches

slide-14
SLIDE 14

Global descriptor

  • Instance search reduces to similarity search in d-dimensional space
  • Compatible with efficient nearest neighbor techniques
slide-15
SLIDE 15

Local descriptors

  • Pairwise similarity
  • Index local descriptors separately
slide-16
SLIDE 16

Match kernel – voting approach

Embedding function: Image similarity: Local similarity function:

slide-17
SLIDE 17

Voting approach & global descriptor

Linear matching of local descriptors

slide-18
SLIDE 18

Voting approach & global descriptor

Linear matching of local descriptors voting approach

slide-19
SLIDE 19

Voting approach & global descriptor

Linear matching of local descriptors voting approach

slide-20
SLIDE 20

Voting approach & global descriptor

Linear matching of local descriptors global descriptor voting approach

slide-21
SLIDE 21

From local to global

process & aggregate post-processing (whitening) local descriptor set global descriptor global descriptor

slide-22
SLIDE 22

Hand-crafted methods

slide-23
SLIDE 23

Locally invariant features

  • Local feature detectors

SIFT [ICCV’99] SURF [ECCV’06] MSER [BMVC’03] Hessian-Affine [IJCV’04], …

  • Local descriptors

SIFT [ICCV’99] SURF [ECCV’06], …

slide-24
SLIDE 24

Bag-of-Words (BoW)

  • Descriptor quantization
  • Clustering  visual codebook
  • Cluster  visual word
  • Matching only within cluster

[Sivic, ICCV’03] [Csurka, ECCVW’04]

slide-25
SLIDE 25

Bag-of-Words (BoW)

  • Descriptor quantization
  • Clustering  visual codebook
  • Cluster  visual word
  • Matching only within cluster

[Sivic, ICCV’03] [Csurka, ECCVW’04]

slide-26
SLIDE 26

Bag-of-Words (BoW)

not all pairs contribute

  • Descriptor quantization
  • Clustering  visual codebook
  • Cluster  visual word
  • Matching only within cluster

[Sivic, ICCV’03] [Csurka, ECCVW’04]

slide-27
SLIDE 27

Bag-of-Words (BoW)

global descriptor voting approach i-th element

slide-28
SLIDE 28

Vector of locally aggregated descriptors VLAD

Aggregate residual vectors, instead of counts

[Jegou et al., CVPR’10]

slide-29
SLIDE 29

Dense features

[Zhao et al., BMVC’13]

… … … … … …

Aggregating dense features  fixed dimensionality of global descriptor DenseVLAD [Torii et al, CVPR’15] [Tolias et al., ECCV’14]

slide-30
SLIDE 30

Global descriptors

global descriptor voting approach

  • Global descriptor preferred for small codebooks (8-512 centroids)
slide-31
SLIDE 31

Large codebooks and inverted files

1: 2: … i: ... K: Codebook size in the order of 106 [Philbin et al., CVPR’07]

slide-32
SLIDE 32

Large codebooks and inverted files

1: 2: … i: ... K: Codebook size in the order of 106 [Philbin et al., CVPR’07]

slide-33
SLIDE 33

Large codebooks and inverted files

1: 2: … i: ... K: Codebook size in the order of 106 [Philbin et al., CVPR’07]

slide-34
SLIDE 34

Large codebooks and inverted files

1: 2: … i: ... K: Codebook size in the order of 106 [Philbin et al., CVPR’07]

slide-35
SLIDE 35

Large codebooks and inverted files

1: 2: … i: ... K: … … … … Codebook size in the order of 106 [Philbin et al., CVPR’07]

slide-36
SLIDE 36

Large codebooks and inverted files

1: 2: … i: ... K: … … … … Codebook size in the order of 106 [Philbin et al., CVPR’07]

slide-37
SLIDE 37

Large codebooks and inverted files

1: 2: … i: ... K: … … … … Codebook size in the order of 106 [Philbin et al., CVPR’07]

slide-38
SLIDE 38

Selective Match Kernel

Non-linear local similarity Normalized residuals

[Tolias et al. IJCV’16]

slide-39
SLIDE 39

Aggregated Selective Match Kernel (ASMK)

Normalized aggregated residuals

[Tolias et al. IJCV’16]

(or binarized aggregated residuals for efficiency)

slide-40
SLIDE 40

Aggregated Selective Match Kernel (ASMK)

[Tolias et al. IJCV’16] same color = same visual word typical codebook size: 65k

slide-41
SLIDE 41

Spatial verification

tentative correspondences inliers with fast spatial verification [Philbin et al., CVPR’07] [Philbin et al., CVPR’07]

slide-42
SLIDE 42

Learning-based methods

slide-43
SLIDE 43

Global descriptors with CNNs

  • Descriptor / network architecture
  • Loss function
  • Supervision / training data
slide-44
SLIDE 44

Global descriptors with CNNs

embedding & aggregation

fully convolutional network

descriptor:

slide-45
SLIDE 45

BoW with CNN features

[Mohedano et al. ICMR’16]

  • Used with pre-trained features and hard assignment
  • Soft assignment needed for training
slide-46
SLIDE 46

NetVLAD

soft-assignment

Codebook centroids are trainable parameters

[Arandjelovic et al., CVPR’16]

slide-47
SLIDE 47
  • Descriptor
  • Pair-wise similarity
  • Simple but works

 discriminative power of CNN activations

Sum pooling – SPoC descriptor

[Babenko & Lempitsky, ICCV’15]

slide-48
SLIDE 48

Weighted sum pooling – CroW descriptor

α: weight based on L2 norm of local descriptors β: inverted-document-frequency weight example of α [Kalantidis et al., ECCV’16]

slide-49
SLIDE 49

Max pooling – MAC descriptor

Input image conv5 filter 1 conv5 filter 2 …. conv5 filter i …. conv5 filter K

[Razavian et al., MTA’16] [Tolias et al., ICLR’16]

slide-50
SLIDE 50

Max pooling – MAC descriptor

Input image conv5 filter 1 conv5 filter 2 …. conv5 filter i …. conv5 filter K

maximum activation

[Razavian et al., MTA’16] [Tolias et al., ICLR’16]

slide-51
SLIDE 51

Max pooling – MAC descriptor

[Razavian et al., MTA’16] [Tolias et al., ICLR’16] regions for top matching components different color per component pair 1 pair 3 pair 2

slide-52
SLIDE 52

Generalized mean pooling – GeM descriptor

𝑞 → ∞ max pool (MAC) 𝑞 = 1 avg pool (SPoC) [Radenovic et al., PAMI’19]

slide-53
SLIDE 53

Hybrid – R-MAC descriptor

MAC descriptor

[Tolias et al., ICLR’16]

slide-54
SLIDE 54

Hybrid – R-MAC descriptor

MAC descriptor

[Tolias et al., ICLR’16]

slide-55
SLIDE 55

Hybrid – R-MAC descriptor

MAC descriptor

[Tolias et al., ICLR’16]

slide-56
SLIDE 56

Hybrid – R-MAC descriptor

MAC descriptor

[Tolias et al., ICLR’16]

slide-57
SLIDE 57

Hybrid – R-MAC descriptor

MAC descriptor

  • Normalize descriptors
  • Sum aggregate

[Tolias et al., ICLR’16]

slide-58
SLIDE 58

Performance comparison

10 20 30 40 50 60 70 80 ResNet101 pre-trained ResNet101 fine-tuned SPoC CroW MAC GeM R-MAC

Precision@10 on R-Oxford+1M distractors

slide-59
SLIDE 59

Performance comparison

10 20 30 40 50 60 70 80 ResNet101 pre-trained ResNet101 fine-tuned SPoC CroW MAC GeM R-MAC

Precision@10 on R-Oxford+1M distractors Fine-tuning improvement for GeM: +26.6%

slide-60
SLIDE 60

Attention maps

Descriptor Image similarity [Kim et al., CVPR’17]

image from Kim et al.

learned attention: 83.2 vs CroW attention : 80.1 NetVLAD top-1 accuracy, San Francisco learned attention : 75.2 vs no attention : 71.8 NetVLAD top-1 accuracy, Tokyo 24/7

slide-61
SLIDE 61

DELF

Descriptor Image similarity [Noh et al., ICCV’17]

image from Noh et al.

slide-62
SLIDE 62

DELF

[Noh et al., ICCV’17]

Training time:

  • Aggregate local activations into global descriptor
  • Global descriptor in the loss

Test time:

  • No aggregation into global descriptor
  • Index strongest local descriptors separately

image from Noh et al.

mAP on Oxford5k Global descriptor 77.9 Voting + SP 83.8

(not in the paper, thanks to authors for sharing)

slide-63
SLIDE 63

Training loss

slide-64
SLIDE 64

Loss function for N-way classification

cross entropy loss Discrete labels  difficult to obtain at instance-level FC layer soft-max global descriptor

slide-65
SLIDE 65

Loss functions for metric learning

anchor negative positive

Contrastive loss Triplet loss

far enough as close as possible large enough

  • Sampling from discrete class labels
  • problem: large intra-class variability
  • Need automatic ways for pair-wise labels
slide-66
SLIDE 66

Average precision loss

The larger the batch the better  no need to sample [Revaud et al., ICCV’19]

slide-67
SLIDE 67

Average Precision Loss

[Revaud et al., ICCV’19]

slide-68
SLIDE 68

Training data

slide-69
SLIDE 69

Training data from GPS: negatives

anchor candidate negatives

slide-70
SLIDE 70

Training data from GPS: negatives

anchor candidate positives camera orientation (unkown)

slide-71
SLIDE 71

Training data from GPS: negatives

anchor candidate positives camera orientation (unkown)

Descriptor distance to resolve:  pick the closest [Arandjelovic et al., CVPR’16]

slide-72
SLIDE 72

Training data from SfM

7.4M images  713 training 3D models [Schonberger et al. CVPR’15] [Radenovic et al. CVPR’16]

slide-73
SLIDE 73

Training data from SfM

camera orientation known number of inliers known

7.4M images  713 training 3D models [Schonberger et al. CVPR’15] [Radenovic et al. CVPR’16]

slide-74
SLIDE 74

Training data from SfM: hard negatives

anchor

Negative examples: images from different 3D models than the query Hard negatives: closest negative examples to the query [Radenovic et al. PAMI’19]

slide-75
SLIDE 75

Training data from SfM: hard negatives

anchor the most similar CNN descriptor

Negative examples: images from different 3D models than the query Hard negatives: closest negative examples to the query [Radenovic et al. PAMI’19]

slide-76
SLIDE 76

Training data from SfM: hard negatives

anchor the most similar CNN descriptor naive hard negatives top k by CNN

Negative examples: images from different 3D models than the query Hard negatives: closest negative examples to the query

increasing CNN descriptor distance to the query

[Radenovic et al. PAMI’19]

slide-77
SLIDE 77

Training data from SfM: hard negatives

anchor the most similar CNN descriptor naive hard negatives top k by CNN

Negative examples: images from different 3D models than the query Hard negatives: closest negative examples to the query

increasing CNN descriptor distance to the query

[Radenovic et al. PAMI’19]

slide-78
SLIDE 78

Training data from SfM: hard negatives

anchor the most similar CNN descriptor naive hard negatives top k by CNN diverse hard negatives top k: one per 3D model

Negative examples: images from different 3D models than the query Hard negatives: closest negative examples to the query

increasing CNN descriptor distance to the query

[Radenovic et al. PAMI’19]

slide-79
SLIDE 79

Training data from SfM: hard positives

anchor

Positive examples: images that share 3D points with the query Hard positives: positive examples not close enough to the query

[Radenovic et al. PAMI’19]

slide-80
SLIDE 80

Training data from SfM: hard positives

anchor top 1 by CNN

Positive examples: images that share 3D points with the query Hard positives: positive examples not close enough to the query

[Radenovic et al. PAMI’19]

slide-81
SLIDE 81

Training data from SfM: hard positives

anchor top 1 by CNN

Positive examples: images that share 3D points with the query Hard positives: positive examples not close enough to the query

[Radenovic et al. PAMI’19]

slide-82
SLIDE 82

Training data from SfM: hard positives

anchor top 1 by CNN top 1 by inliers

harder positives

Positive examples: images that share 3D points with the query Hard positives: positive examples not close enough to the query

[Radenovic et al. PAMI’19]

slide-83
SLIDE 83

Training data from SfM: hard positives

anchor top 1 by CNN top 1 by inliers random from top k by inliers

harder positives

Positive examples: images that share 3D points with the query Hard positives: positive examples not close enough to the query

[Radenovic et al. PAMI’19]

slide-84
SLIDE 84

Positive and negative training images

[Radenovic et al. PAMI’19]

slide-85
SLIDE 85

Positive and negative training images

Oxford 5k Paris 6k

Off-the-shelf

44.2 51.6

[Radenovic et al. PAMI’19]

slide-86
SLIDE 86

Positive and negative training images

Oxford 5k Paris 6k

Off-the-shelf top 1 CNN + top k CNN

44.2 51.6 56.2 63.1

[Radenovic et al. PAMI’19]

slide-87
SLIDE 87

Positive and negative training images

Oxford 5k Paris 6k

Off-the-shelf top 1 CNN + top k CNN

44.2 51.6 56.2 63.1

[Radenovic et al. PAMI’19]

slide-88
SLIDE 88

Positive and negative training images

Oxford 5k Paris 6k

Off-the-shelf top 1 CNN + top k CNN top 1 CNN + top 1 / model CNN

44.2 51.6 56.2 63.1 56.7 63.9

[Radenovic et al. PAMI’19]

slide-89
SLIDE 89

Positive and negative training images

Oxford 5k Paris 6k

Off-the-shelf top 1 CNN + top k CNN top 1 CNN + top 1 / model CNN top 1 inliers + top 1 / model CNN

44.2 51.6 56.2 63.1 56.7 63.9 59.7 67.1

[Radenovic et al. PAMI’19]

slide-90
SLIDE 90

Positive and negative training images

Oxford 5k Paris 6k

Off-the-shelf top 1 CNN + top k CNN top 1 CNN + top 1 / model CNN top 1 inliers + top 1 / model CNN random(top k inliers) + top 1 / model CNN

44.2 51.6 56.2 63.1 56.7 63.9 59.7 67.1 60.2 67.5

[Radenovic et al. PAMI’19]

slide-91
SLIDE 91

Class labels + cleaning

[Gordo et al. IJCV’18]

Use classical computer vision to collect training data:  Bag-of-Words and spatial verification

slide-92
SLIDE 92

Class labels + cleaning

[Gordo et al. IJCV’18]

slide-93
SLIDE 93

Class labels + cleaning

[Gordo et al. IJCV’18]

classification loss vs ranking loss

slide-94
SLIDE 94

PlaNet

adaptive partitioning into k=26,263 cells N-way classification training [Weyand et al., ICCV’17] Very compact model (377 MB)! But is it better than instance search?

slide-95
SLIDE 95

Revisiting IM2GPS

[Vo et al., CVPR’17]

  • A. Classification with globe partitioning
  • best at coarse level, bad at fine level
  • very compact model

Evaluation at different scales IM2GPS dataset Fine street (1km) city (25km) Coarse scale region (250km) country (750km) continent (7500km)

slide-96
SLIDE 96

Revisiting IM2GPS

[Vo et al., CVPR’17]

  • A. Classification with globe partitioning
  • best at coarse level, bad at fine level
  • very compact model
  • B. Descriptors from A used for instance search
  • improves for fine level
  • all descriptors in memory

Evaluation at different scales IM2GPS dataset Fine street (1km) city (25km) Coarse scale region (250km) country (750km) continent (7500km)

slide-97
SLIDE 97

Revisiting IM2GPS

[Vo et al., CVPR’17]

  • A. Classification with globe partitioning
  • best at coarse level, bad at fine level
  • very compact model
  • B. Descriptors from A used for instance search
  • improves for fine level
  • all descriptors in memory
  • C. Fine-tuning A with ranking loss, use for instance search
  • no improvements
  • high intra class variability / not challenging pairs

Evaluation at different scales IM2GPS dataset Fine street (1km) city (25km) Coarse scale region (250km) country (750km) continent (7500km)

slide-98
SLIDE 98

Revisiting IM2GPS

[Vo et al., CVPR’17]

  • A. Classification with globe partitioning
  • best at coarse level, bad at fine level
  • very compact model
  • B. Descriptors from A used for instance search
  • improves for fine level
  • all descriptors in memory
  • C. Fine-tuning A with ranking loss, use for instance search
  • no improvements
  • high intra class variability / not challenging pairs
  • D. Global descriptor (MAC) trained with SfM data [Radenovic et al.]
  • the best for fine level
  • all descriptors in memory

Evaluation at different scales IM2GPS dataset Fine street (1km) city (25km) Coarse scale region (250km) country (750km) continent (7500km)

slide-99
SLIDE 99

Google landmark recognition challenge

Combining global GeM with local DELF-ASMK

slide-100
SLIDE 100

GeM-based recognition

slide-101
SLIDE 101

GeM common mistakes

slide-102
SLIDE 102

GeM common mistakes

slide-103
SLIDE 103

Recognition with ASMK-DELF + spatial verification

slide-104
SLIDE 104

Spatial verification – common mistakes

slide-105
SLIDE 105

Spatial verification – common mistakes

slide-106
SLIDE 106

Combined classifier

23.3 21.1 29.3 5 10 15 20 25 30 35 GeM ASMK-DELF+SP Combined

slide-107
SLIDE 107

Global descriptor vs voting approach

DELF with ASMK*+SP

  • Prec@10: 81.1

GeM fine-tuned with SfM, D = 512

  • Prec@10: 68.6

R-Oxford+1M distractors [Radenovic et al., CVPR’18]

slide-108
SLIDE 108

Global descriptor vs voting approach

DELF with ASMK*+SP

  • Prec@10: 81.1
  • DB mem. (GB): 10.3

GeM fine-tuned with SfM, D = 512

  • Prec@10: 68.6
  • DB mem. (GB): 1.92

R-Oxford+1M distractors [Radenovic et al., CVPR’18]

slide-109
SLIDE 109

Global descriptor vs voting approach

DELF with ASMK*+SP

  • Prec@10: 81.1
  • DB mem. (GB): 10.3

GeM fine-tuned with SfM, D = 512

  • Prec@10: 68.6
  • DB mem. (GB): 1.92

R-Oxford+1M distractors [Radenovic et al., CVPR’18]

slide-110
SLIDE 110

Global descriptor vs voting approach

DELF with ASMK*+SP

  • Prec@10: 81.1
  • DB mem. (GB): 10.3
  • Query time, extraction + search (sec): 0.42 (GPU) + 0.98 (CPU)

GeM fine-tuned with SfM, D = 512

  • Prec@10: 68.6
  • DB mem. (GB): 1.92
  • Query time, extraction + search (sec): 0.23 (GPU) + 0.56 (CPU)

R-Oxford+1M distractors [Radenovic et al., CVPR’18]

slide-111
SLIDE 111

Global descriptors & quantization

fine-tuned R-MAC ResNet101 Uncompressed (8192 bytes): 82.8 [Gordo et al. IJCV’18] Product Quantization (PQ) [Jegou et al., PAMI’10]

slide-112
SLIDE 112

Descriptor whitening

slide-113
SLIDE 113

Post-processing with whitening

process & aggregate post-processing (whitening) local descriptor set global descriptor global descriptor

slide-114
SLIDE 114

Post-processing with whitening

process & aggregate post-processing (whitening) local descriptor set global descriptor global descriptor pre-trained

slide-115
SLIDE 115

Post-processing with whitening

process & aggregate post-processing (whitening) local descriptor set global descriptor global descriptor learned end-to-end

slide-116
SLIDE 116

Descriptor processing with PCA

eigen-vectors as columns global descriptor mean vector

slide-117
SLIDE 117

PCA and power-law normalization

jointly down-weigh co-occurring features

[Perronnin et al., CVPR’10]

slide-118
SLIDE 118

PCA whitening

[Jegou & Chum, ECCV’12]

jointly down-weigh co-occurring features

slide-119
SLIDE 119

PCA whitening with shrinkage

jointly down-weigh co-occurring features

[Mukundan et al., IJCV’19]

slide-120
SLIDE 120

Supervised whitening

Covariance from matching and non-matching pairs Obtain the whitening from covariance of matching

  • whitening matrix:

Obtain the rotation by PCA on whitened covariance of non-matching: [Mikolajczyk & Matas, ICCV’07]

slide-121
SLIDE 121

Results – descriptor post-processing

MAC 31.6 mAP on R-Oxford pre-trained ResNet101

slide-122
SLIDE 122

Results – descriptor post-processing

MAC 31.6 PCA+power-law 40.1 mAP on R-Oxford pre-trained ResNet101

slide-123
SLIDE 123

Results – descriptor post-processing

MAC 31.6 PCA+power-law 40.1 mAP on R-Oxford pre-trained ResNet101

slide-124
SLIDE 124

Results – descriptor post-processing

MAC 31.6 PCA+power-law 40.1 PCA whitening 41.7 mAP on R-Oxford pre-trained ResNet101

slide-125
SLIDE 125

Results – descriptor post-processing

MAC 31.6 PCA+power-law 40.1 PCA whitening 41.7 mAP on R-Oxford pre-trained ResNet101

slide-126
SLIDE 126

Results – descriptor post-processing

MAC 31.6 PCA+power-law 40.1 PCA whitening 41.7 PCA wh.+shrink 43.5 mAP on R-Oxford pre-trained ResNet101

slide-127
SLIDE 127

Results – descriptor post-processing

MAC 31.6 PCA+power-law 40.1 PCA whitening 41.7 PCA wh.+shrink 43.5 mAP on R-Oxford pre-trained ResNet101

slide-128
SLIDE 128

Results – descriptor post-processing

MAC 31.6 PCA+power-law 40.1 PCA whitening 41.7 PCA wh.+shrink 43.5 supervised whitening 46.9 mAP on R-Oxford pre-trained ResNet101

slide-129
SLIDE 129

Results – descriptor post-processing

MAC 31.6 PCA+power-law 40.1 PCA whitening 41.7 PCA wh.+shrink 43.5 supervised whitening 46.9 mAP on R-Oxford pre-trained ResNet101

slide-130
SLIDE 130

Results – descriptor post-processing

MAC 31.6 PCA+power-law 40.1 PCA whitening 41.7 PCA wh.+shrink 43.5 supervised whitening 46.9 mAP on R-Oxford pre-trained ResNet101 SPoC 24.5 42.7

slide-131
SLIDE 131

Results – descriptor post-processing

MAC 31.6 PCA+power-law 40.1 PCA whitening 41.7 PCA wh.+shrink 43.5 supervised whitening 46.9 mAP on R-Oxford pre-trained ResNet101 SPoC 24.5 42.7 GeM 31.6 50.1

slide-132
SLIDE 132

Results – descriptor post-processing

MAC 31.6 PCA+power-law 40.1 PCA whitening 41.7 PCA wh.+shrink 43.5 supervised whitening 46.9 mAP on R-Oxford pre-trained ResNet101 SPoC 24.5 42.7 GeM 31.6 50.1

slide-133
SLIDE 133

Results – descriptor post-processing

MAC 31.6 PCA+power-law 40.1 PCA whitening 41.7 PCA wh.+shrink 43.5 supervised whitening 46.9 mAP on R-Oxford pre-trained ResNet101 SPoC 24.5 42.7 GeM 31.6 50.1 fine-tuned ResNet101 GeM 52.9 64.1

slide-134
SLIDE 134

Results – descriptor post-processing

MAC 31.6 PCA+power-law 40.1 PCA whitening 41.7 PCA wh.+shrink 43.5 supervised whitening 46.9 mAP on R-Oxford pre-trained ResNet101 SPoC 24.5 42.7 GeM 31.6 50.1 fine-tuned ResNet101 GeM 52.9 64.1 some pooling operations benefit more than others additional insight about whitening in [Mukundan et al. IJCV’19]

slide-135
SLIDE 135

Post-processing with whitening

process & aggregate post-processing (whitening) local descriptor set global descriptor global descriptor learned end-to-end

slide-136
SLIDE 136

Post-processing with whitening

process & aggregate FC (whitening) local descriptor set global descriptor global descriptor learned end-to-end

https://github.com/filipradenovic/cnnimageretrieval-pytorch

slide-137
SLIDE 137

Summary

Instance search for place recognition Better than classification-based approaches Classification loss can handle large intra-class variability ranking loss needs selection of clean & challenging pairs Global descriptors vs voting approaches speed/compactness vs better accuracy Surprising effectiveness of descriptor whitening

slide-138
SLIDE 138

Privacy preserving search

Tolias, Radenovic, Chum: Targeted mismatch adversarial attacks. ICCV 2019

slide-139
SLIDE 139

Private content – personal data

Access to place recognition / retrieval systems  sharing of private content

slide-140
SLIDE 140

Concealed queries

Original queries

slide-141
SLIDE 141

Concealed queries

Original queries Concealed queries

slide-142
SLIDE 142

Adversarial attacks

+ = plane plane Non-targeted misclassification

slide-143
SLIDE 143

Adversarial attacks

+ = plane kangaroo Targeted misclassification

slide-144
SLIDE 144

Adversarial attacks

+ = dissimilar descriptors Non-targeted mismatch

slide-145
SLIDE 145

Adversarial attacks

+ = similar descriptors Targeted mismatch carrier attack target perturbation concealed query initial query

slide-146
SLIDE 146

Typical descriptor for retrieval

slide-147
SLIDE 147

Targeted mismatch attacks

Known: FCN, Pooling & Normalization (PN), Re-sampling Unknown: Post-processing

slide-148
SLIDE 148

Targeted mismatch attacks

Known: FCN, Re-sampling Unknown: Pooling & Normalization (PN), Post-processing

slide-149
SLIDE 149

Targeted mismatch attacks

Known: FCN, Re-sampling Unknown: Pooling & Normalization (PN), Post-processing

slide-150
SLIDE 150

Targeted mismatch attacks

Known: FCN Unknown: Re-sampling, Pooling & Normalization (PN), Post-processing multi-scale attack Key ingredient: blurring before re-sampling

slide-151
SLIDE 151

target carrier 0.782

slide-152
SLIDE 152

target carrier 0.782 1.000 GeM, λ=0

slide-153
SLIDE 153

target carrier 0.782 1.000 GeM, λ=0 tens, λ=0 0.999

slide-154
SLIDE 154

target carrier 0.782 1.000 GeM, λ=0 tens, λ=0 0.999

slide-155
SLIDE 155

target carrier 0.782 1.000 GeM, λ=0 tens, λ=0 0.999 tens, λ=1 0.997

slide-156
SLIDE 156

target carrier 0.782 1.000 GeM, λ=0 tens, λ=0 0.999 tens, λ=1 0.997

slide-157
SLIDE 157

target carrier 0.782 1.000 GeM, λ=0 tens, λ=0 0.999 tens, λ=1 0.997

slide-158
SLIDE 158

target carrier 0.782 1.000 GeM, λ=0 tens, λ=0 0.999 tens, λ=1 0.997

slide-159
SLIDE 159

target carrier 0.782 1.000 GeM, λ=0 tens, λ=0 0.999 tens, λ=1 0.997 1.000 hist, λ=0

slide-160
SLIDE 160

target carrier 0.782 1.000 GeM, λ=0 tens, λ=0 0.999 tens, λ=1 0.997 1.000 hist, λ=0

slide-161
SLIDE 161

Unknown resolution and multi-scale attack

performance similarity to target

slide-162
SLIDE 162

Unknown resolution and multi-scale attack

multi-scale attack with dense resolution sampling performance similarity to target

slide-163
SLIDE 163

Unseen network

slide-164
SLIDE 164

Targeted mismatch adversarial attacks

https://github.com/gtolias/tma Online code