Instance-level recognition 1) Local invariant features 2) Matching - - PowerPoint PPT Presentation

instance level recognition
SMART_READER_LITE
LIVE PREVIEW

Instance-level recognition 1) Local invariant features 2) Matching - - PowerPoint PPT Presentation

Instance-level recognition 1) Local invariant features 2) Matching and recognition with local features 3) Efficient visual search 4) Very large scale indexing Visual search Image search system for large datasets Large image dataset (one


slide-1
SLIDE 1

Instance-level recognition

1) Local invariant features 2) Matching and recognition with local features 3) Efficient visual search 4) Very large scale indexing

slide-2
SLIDE 2

Visual search

slide-3
SLIDE 3

Image search system for large datasets

Image search system ranked image list Large image dataset (one million images or more) query

  • Issues for very large databases
  • to reduce the query time
  • to reduce the storage requirements
  • with minimal loss in retrieval accuracy
slide-4
SLIDE 4

Two strategies

  • 1. Efficient approximate nearest neighbor search on local

feature descriptors

  • 2. Quantize descriptors into a “visual vocabulary” and use

efficient techniques from text retrieval (Bag-of-words representation)

slide-5
SLIDE 5

Images

Local features invariant descriptor vectors

1. Compute local features in each image independently 2. Describe each feature by a descriptor vector 3. Find nearest neighbour vectors between query and database 4. Rank matched images by number of (tentatively) corresponding regions 5. Verify top ranked images based on spatial consistency

Strategy 1: Efficient approximate NN search

invariant descriptor vectors

slide-6
SLIDE 6

Voting algorithm

local characteristics vector of

( )

1

I

1

I

n

I

2

I

2

I

slide-7
SLIDE 7

Voting algorithm

1

I

1

I

n

I

2

I

2

I

1 1

2 1 1 I is the corresponding model image

1

2 1 1

slide-8
SLIDE 8

Finding nearest neighbour vectors

Establish correspondences between query image and images in the database by nearest neighbour matching on SIFT vectors

128D descriptor space Model image Image database

Solve following problem for all feature vectors, , in the query image: where, , are features from all the database images.

slide-9
SLIDE 9

Quick look at the complexity of the NN-search

N … images M … regions per image (~1000) D … dimension of the descriptor (~128) Exhaustive linear search: O(M NMD) Example:

  • Matching two images (N=1), each having 1000 SIFT descriptors

Nearest neighbors search: 0.4 s (2 GHz CPU, implemenation in C)

  • Memory footprint: 1000 * 128 = 128kB / image

N = 1,000 … ~7min (~100MB) N = 10,000 … ~1h7min (~ 1GB) … N = 107 ~115 days (~ 1TB) … All images on Facebook: N = 1010 … ~300 years (~ 1PB)

# of images CPU time Memory req.

slide-10
SLIDE 10

Nearest-neighbor matching

Solve following problem for all feature vectors, xj, in the query image: where xi are features in database images. Nearest-neighbour matching is the major computational bottleneck

  • Linear search performs dn operations for n features in the

database and d dimensions

  • No exact methods are faster than linear search for d>10
  • Approximate methods can be much faster, but at the cost of

missing some correct matches

slide-11
SLIDE 11

l1 l8 1 l2 l3 l4 l5 l7 l6 l9 l10 3 2 5 4 11 9 10 8 6 7

4 7 6 5 1 3 2 9 8 10 11 l1 l2

K-d tree

  • K-d tree is a binary tree data structure for organizing a set of points
  • Each internal node is associated with an axis aligned hyper-plane

splitting its associated points into two sub-trees

  • Dimensions with high variance are chosen first
  • Position of the splitting hyper-plane is chosen as the mean/median of

the projected points – balanced tree

slide-12
SLIDE 12

Large scale object/scene recognition

  • Each image described by approximately 1000 descriptors

– 109 descriptors to index for one million images!

  • Database representation in RAM:

– Size of descriptors : 1 TB, search+memory intractable

Image search system ranked image list Image dataset: > 1 million images query

slide-13
SLIDE 13

Bag-of-features [Sivic&Zisserman’03]

Harris-Hessian-Laplace regions + SIFT descriptors Bag-of-features processing +tf-idf weighting querying

sparse frequency vector centroids (visual words) Inverted file ranked image short-list Set of SIFT descriptors Query image

Geometric verification

Re-ranked list

  • “visual words”:

– 1 “word” (index) per local descriptor – only images ids in inverted file  8 GB fits!

[Chum & al. 2007]

slide-14
SLIDE 14

Indexing text with inverted files

Need to map feature descriptors to “visual words”

Inverted file: Term List of hits (occurrences in documents) People [d1:hit hit hit], [d4:hit hit] … Common [d1:hit hit], [d3: hit], [d4: hit hit hit] … Sculpture [d2:hit], [d3: hit hit hit] … Document collection:

slide-15
SLIDE 15

[Sivic and Zisserman, ICCV 2003] Vector quantize descriptors

  • Compute SIFT features from a subset of images
  • K-means clustering (need to choose K)

Build a visual vocabulary

128D descriptor space 128D descriptor space

slide-16
SLIDE 16

K-means clustering

Minimizing sum of squared Euclidean distances between points xi and their nearest cluster centers Algorithm:

  • Randomly initialize K cluster centers
  • Iterate until convergence:
  • Assign each data point to the nearest center
  • Recompute each cluster center as the mean of all points

assigned to it

Local minimum, solution dependent on initialization Initialization important, run several times, select best

slide-17
SLIDE 17

Visual words

Example: each group

  • f patches belongs to

the same visual word

17

Figure from S ivic & Zisserman, ICCV 2003

128D descriptor space

slide-18
SLIDE 18

Samples of visual words (clusters on SIFT descriptors):

slide-19
SLIDE 19

Samples of visual words (clusters on SIFT descriptors):

slide-20
SLIDE 20

Sivic and Zisserman, ICCV 2003

Visual words: quantize descriptor space

Nearest neighbour matching

128D descriptor space Image 1 Image 2

  • expensive to

do for all frames

slide-21
SLIDE 21

Sivic and Zisserman, ICCV 2003 Nearest neighbour matching

128D descriptor space Image 1 Image 2

Vector quantize descriptors

128D descriptor space Image 1 Image 2

42 5 42 5 5 42

  • expensive to

do for all frames

Visual words: quantize descriptor space

slide-22
SLIDE 22

Sivic and Zisserman, ICCV 2003 Nearest neighbour matching

128D descriptor space Image 1 Image 2

Vector quantize descriptors

128D descriptor space Image 1 Image 2

42 5 42 5 5 42

New image

  • expensive to

do for all frames

Visual words: quantize descriptor space

slide-23
SLIDE 23

Sivic and Zisserman, ICCV 2003 Nearest neighbour matching

128D descriptor space Image 1 Image 2

Vector quantize descriptors

128D descriptor space Image 1 Image 2

42 5 42 5 5 42

New image

42

  • expensive to

do for all frames

Visual words: quantize descriptor space

slide-24
SLIDE 24

Vector quantize the descriptor space (SIFT)

The same visual word

5 42

slide-25
SLIDE 25

Image Colelction of visual words

Representation: bag of (visual) words

Visual words are ‘iconic’ image patches or fragments

  • represent their frequency of occurrence
  • but not their position
slide-26
SLIDE 26

Offline: Assign visual words and compute histograms for each image

Normalize patch Detect patches Compute SIFT descriptor 5 42 Represent image as a sparse histogram of visual word occurrences 2 1 1 … Find nearest cluster center

slide-27
SLIDE 27

Offline: create an index

Image credit: A. Zisserman

  • K. Grauman, B. Leibe

Word number Posting list

  • For fast search, store a “posting list” for the dataset
  • This maps visual word occurrences to the images they occur in

(i.e. like the “book index”)

slide-28
SLIDE 28

At run time

Image credit: A. Zisserman

  • K. Grauman, B. Leibe

Word number Posting list

  • User specifies a query region
  • Generate a short-list of images using visual words in the region

1. Accumulate all visual words within the query region 2. Use “book index” to find other images with these words 3. Compute similarity for images sharing at least one word

slide-29
SLIDE 29

At run time

Image credit: A. Zisserman

  • K. Grauman, B. Leibe
  • Score each image by the (weighted) number of common

visual words (tentative correspondences)

  • Worst case complexity is linear in the number of images N
  • In practice, it is linear in the length of the lists (<< N)

Word number Posting list

slide-30
SLIDE 30

Another interpretation: Bags of visual words

Summarize entire image based

  • n its distribution (histogram)
  • f visual word occurrences

Slide: Grauman&Leibe, Image: L. Fei-Fei Hofmann 2001

... 1 ... ... 2

t

d =

Analogous to bag of words representation commonly used for text documents

slide-31
SLIDE 31

For a vocabulary of size K, each image is represented by a K-vector where ti is the number of occurrences of visual word i Images are ranked by the normalized scalar product between the query vector vq and all vectors in the database vd:

Another interpretation: the bag-of-visual-words model

Scalar product can be computed efficiently using inverted file

slide-32
SLIDE 32

Bag-of-features [Sivic&Zisserman’03]

Harris-Hessian-Laplace regions + SIFT descriptors Bag-of-features processing +tf-idf weighting querying

sparse frequency vector centroids (visual words) Inverted file ranked image short-list Set of SIFT descriptors Query image

Geometric verification

Re-ranked list [Chum & al. 2007]

1 2 3 3 4 5 Results

slide-33
SLIDE 33

Geometric verification

Use the position and shape of the underlying features to improve retrieval quality Both images have many matches – which is correct?

slide-34
SLIDE 34

Geometric verification

  • Remove outliers, many matches are incorrect
  • Estimate geometric transformation
  • Robust strategies

– RANSAC – Hough transform

slide-35
SLIDE 35

Geometric verification

We can measure spatial consistency between the query and each result to improve retrieval quality, re-rank Many spatially consistent matches – correct result Few spatially consistent matches – incorrect result

slide-36
SLIDE 36

Geometric verification

Gives localization of the object

slide-37
SLIDE 37

Geometric verification – example

  • 1. Query
  • 3. Spatial verification (re-rank on # of inliers)

  • 2. Initial retrieval set (bag of words model)
slide-38
SLIDE 38

Evaluation dataset: Oxford buildings

All Soul's Ashmolean Balliol Bodleian Thom Tower Cornmarket Bridge of Sighs Keble Magdalen University Museum Radcliffe Camera

 Ground truth obtained for 11 landmarks  Evaluate performance by mean Average Precision

slide-39
SLIDE 39

Measuring retrieval performance: Precision - Recall

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 recall precision

all images returned images relevant images

  • Precision: % of returned images that

are relevant

  • Recall: % of relevant images that are

returned

slide-40
SLIDE 40

Average Precision

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 recall precision

  • A good AP score requires both high

recall and high precision

  • Application-independent

AP

Performance measured by mean Average Precision (mAP)

  • ver 55 queries on 100K or 1.1M image datasets
slide-41
SLIDE 41
slide-42
SLIDE 42

Query images

Prec. Rec.

  • high precision at low recall (like google)
  • variation in performance over queries
  • does not retrieve all instances
slide-43
SLIDE 43

Obtaining visual words is like a sensor measuring the image “noise” in the measurement process means that some visual words are missing or incorrect, e.g. due to

  • Missed detections
  • Changes beyond built in invariance
  • Quantization effects

Consequence: Visual word in query is missing

Why aren’t all objects retrieved?

Clustered and Clustered and quantized to visual words

sparse frequency vector Set of SIFT descriptors query image [Lowe04, Mikolajczyk07] [Sivic03, Philbin07]

descriptors Hessian-Affine regions + SIFT descriptors

  • 1. Query expansion
  • 2. Better quantization
slide-44
SLIDE 44

Query Expansion in text

In text :

  • Reissue top n responses as queries
  • Blind relevance feedback
  • Danger of topic drift

In vision:

  • Reissue spatially verified image regions as queries
slide-45
SLIDE 45

Automatic query expansion

Visual word representations of two images of the same

  • bject may differ (due to e.g. detection/quantization noise)

resulting in missed returns Initial returns may be used to add new relevant visual words to the query Strong spatial model prevents ‘drift’ by discarding false positives

[Chum, Philbin, Sivic, Isard, Zisserman, ICCV’07; Chum, Mikulik, Perdoch, Matas, CVPR’11]

slide-46
SLIDE 46

Visual query expansion - overview

  • 1. Original query
  • 3. Spatial verification
  • 4. New enhanced query

  • 2. Initial retrieval set
  • 5. Additional retrieved images
slide-47
SLIDE 47

Query Image Originally retrieved image Originally not retrieved

Query Expansion

slide-48
SLIDE 48

Query Expansion

slide-49
SLIDE 49

Query Expansion

slide-50
SLIDE 50

Query Expansion

slide-51
SLIDE 51

Query Expansion

New expanded query is formed as

  • the average of visual word vectors of spatially verified returns
  • only inliers are considered
  • regions are back-projected to the original query image

Spatially verified retrievals with matching regions overlaid New expanded query Query Image

slide-52
SLIDE 52

Query image Originally retrieved Retrieved only after expansion

Query Expansion

slide-53
SLIDE 53

Query image Expanded results (improved) Original results

Prec. Prec. Rec. Rec.

slide-54
SLIDE 54

Quantization errors

Typically, quantization has a significant impact on the final performance of the system [Sivic03,Nister06,Philbin07] Quantization errors split features that should be grouped together and confuse features that should be separated

Voronoi cells

slide-55
SLIDE 55

Visual words – approximate NN search

  • Map descriptors to words by quantizing the feature space

– Quantize via k-means clustering to obtain visual words – Assign descriptors to closest visual words

  • Bag-of-features as approximate nearest neighbor search

Bag-of-features matching function Descriptor matching with k-nearest neighbors where q(x) is a quantizer, i.e., assignment to a visual word and δa,b is the Kronecker operator (δa,b=1 iff a=b)

slide-56
SLIDE 56

Approximate nearest neighbor search evaluation

  • ANN algorithms usually returns a short-list of nearest neighbors

– this short-list is supposed to contain the NN with high probability – exact search may be performed to re-order this short-list

  • Proposed quality evaluation of ANN search: trade-off between

– NN recall = probability that the NN is in this list against – NN precision = proportion of vectors in the short-list

  • the lower this proportion
  • the more information we have about the vector
  • the lower the complexity if we perform exact search on the short-list
  • ANN search algorithms usually have some parameters to handle this trade-off
slide-57
SLIDE 57

ANN evaluation of bag-of-features

  • ANN algorithms

returns a list of potential neighbors

  • NN recall

= probability that the NN is in this list

  • NN precision:

= proportion of vectors in the short-list

  • In BOF, this trade-off

is managed by the number of clusters k

NN recall 0.1 0.2 0.3 0.4 0.5 0.6 0.7 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 rate of points retrieved

k=100 200 500 1000 2000 5000 10000 20000 30000 50000

BOW

slide-58
SLIDE 58

20K visual word: false matches

slide-59
SLIDE 59

200K visual word: good matches missed

slide-60
SLIDE 60

Problem with bag-of-features

  • The matching performed by BOF is weak

– for a “small” visual dictionary: too many false matches – for a “large” visual dictionary: many true matches are missed

  • No good trade-off between “small” and “large” !

– either the Voronoi cells are too big – or these cells can’t absorb the descriptor noise  intrinsic approximate nearest neighbor search of BOF is not sufficient – possible solutions

  • soft assignment [Philbin et al. CVPR’08]
  • additional short codes [Jegou et al. ECCV’08]
slide-61
SLIDE 61

Beyond bags-of-visual-words

  • Soft-assign each descriptor to multiple cluster centers

[Philbin et al. 2008, Van Gemert et al. 2008] A: 0.1 B: 0.5 C: 0.4 B: 1.0

Hard Assignment Soft Assignment

slide-62
SLIDE 62

Beyond bag-of-visual-words

Hamming embedding [Jegou et al. 2008]

  • Standard quantization using bag-of-visual-words
  • Additional localization in the Voronoi cell by a binary

signature

slide-63
SLIDE 63

Hamming Embedding

Representation of a descriptor x – Vector-quantized to q(x) as in standard BOF + short binary vector b(x) for an additional localization in the Voronoi cell Two descriptors x and y match iif

where h(a,b) Hamming distance

slide-64
SLIDE 64

Hamming Embedding

  • Nearest neighbors for Hamming distance  those for Euclidean distance

 a metric in the embedded space reduces dimensionality curse effects

  • Efficiency

– Hamming distance = very few operations – Fewer random memory accesses: 3 x faster that BOF with same dictionary size!

slide-65
SLIDE 65

Hamming Embedding

  • Off-line (given a quantizer)

– draw an orthogonal projection matrix P of size db × d  this defines db random projection directions – for each Voronoi cell and projection direction, compute the median value for a training set

  • On-line: compute the binary signature b(x) of a given

descriptor

– project x onto the projection directions as z(x) = (z1,…zdb) – bi(x) = 1 if zi(x) is above the learned median value, otherwise 0

[H. Jegou et al., Improving bag of features for large scale image search, ECCV’08, ICJV’10]

slide-66
SLIDE 66

Hamming neighborhood

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 rate of NN retrieved (recall) rate of cell points retrieved 8 bits 16 bits 32 bits 64 bits 128 bits

Trade-off between memory usage and accuracy More bits yield higher accuracy In practice, 64 bits (8 byte)

slide-67
SLIDE 67

ANN evaluation of Hamming Embedding

0.7 NN recall 0.1 0.2 0.3 0.4 0.5 0.6 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 rate of points retrieved

k=100 200 500 1000 2000 5000 10000 20000 30000 50000 ht=16 18 20 22

HE+BOW BOW

32 28 24

compared to BOW: at least 10 times less points in the short-list for the same level

  • f NN recall

Hamming Embedding provides a much better trade-off between recall and ambiguity removal

slide-68
SLIDE 68

Matching points - 20k word vocabulary

201 matches 240 matches Many matches with the non-corresponding image!

slide-69
SLIDE 69

Matching points - 200k word vocabulary

69 matches 35 matches Still many matches with the non-corresponding one

slide-70
SLIDE 70

Matching points - 20k word vocabulary + HE

83 matches 8 matches 10x more matches with the corresponding image!

slide-71
SLIDE 71

INRIA holidays dataset

  • Evaluation for the INRIA holidays dataset, 1491 images

– 500 query images + 991 annotated true positives – Most images are holiday photos of friends and family

  • 1 million & 10 million distractor images from Flickr
  • Vocabulary construction on a different Flickr set
  • Evaluation metric: mean average precision (in [0,1],

bigger = better)

– Average over precision/recall curve

slide-72
SLIDE 72

Holiday dataset – example queries

slide-73
SLIDE 73

Dataset : Venice Channel

Query Base 4 Base 3 Base 2 Base 1

slide-74
SLIDE 74

Dataset : San Marco square

Query Base 1 Base 3 Base 2 Base 9 Base 8 Base 4 Base 5 Base 7 Base 6

slide-75
SLIDE 75

Example distractors - Flickr

slide-76
SLIDE 76

Experimental evaluation

  • Evaluation on our holidays dataset, 500 query images, 1 million distracter

images

  • Metric: mean average precision (in [0,1], bigger = better)

Average query time (4 CPU cores) Compute descriptors 880 ms Quantization 600 ms Search – baseline 620 ms Search – WGC 2110 ms Search – HE 200 ms Search – HE+WGC 650 ms

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1000000 100000 10000 1000 mAP database size

baseline WGC HE WGC+HE +re-ranking

slide-77
SLIDE 77

Results – Venice Channel

Base 1 Flickr Flickr Base 4 Query

slide-78
SLIDE 78

Image retrieval - products

  • Search for places and particular objects

– For example on a smart phone Courtesy Google

slide-79
SLIDE 79

Google image search

slide-80
SLIDE 80

Towards large-scale image search

  • BOF+inverted file can handle up to ~10 millions images

– with a limited number of descriptors per image  RAM: 40GB – search: 2 seconds

  • Web-scale = billions of images

– with 100 M per machine  search: 20 seconds, RAM: 400 GB – not tractable

  • Solution: represent each image by one compressed vector
slide-81
SLIDE 81

Very large scale image search

Hessian-Affine regions + SIFT descriptors Bag-of-features processing +tf-idf weighting

description vector centroids (visual words) [Mikolajezyk & Schmid 04] [Lowe 04] ranked image short-list Set of SIFT descriptors Query image

Geometric verification

Re-ranked list [Lowe 04, Chum & al 2007]

Vector compression Vector search

  • Each image is represented by one vector

(Bag-of-features, VLAD, Fisher, GIST)

  • Vector compression to reduce storage

requirements and search time

slide-82
SLIDE 82

Aggregating local descriptors

  • Set of n local descriptors  1 vector
  • Popular approach: bag of features, often with SIFT features
  • Recently improved aggregation schemes

– Fisher vector [Perronnin & Dance ‘07] – VLAD descriptor [Jegou, Douze, Schmid, Perez ‘10] – Supervector [Zhou et al. ‘10] – Sparse coding [Wang et al. ’10, Boureau et al.’10]

  • Used in very large-scale retrieval and classification
slide-83
SLIDE 83

Global scene context – GIST descriptor

The “gist” of a scene: Oliva & Torralba (2001)

5 frequency bands and 6 orientations for each image location

Tiling of the image for the description

Global representation

slide-84
SLIDE 84

Aggregating local descriptors

Most popular approach: BoF representation [Sivic & Zisserman 03]

sparse vector

highly dimensional → significant dimensionality reduction introduces loss

Vector of locally aggregated descriptors (VLAD) [Jegou et al. 10]

non sparse vector

fast to compute

excellent results with a small vector dimensionality

Fisher vector [Perronnin & Dance 07]

probabilistic version of VLAD

initially used for image classification

comparable performance to VLAD for image retrieval

slide-85
SLIDE 85

VLAD : vector of locally aggregated descriptors

Determine a vector quantifier (k-means)

  • utput: k centroids (visual words): c1,…,ci,…ck

centroid ci has dimension d

For a given image

assign each descriptor to closest center ci

accumulate (sum) descriptors per cell vi := vi + (x - ci)

VLAD (dimension D = k x d)

The vector is square-root + L2-normalized

Alternative: Fisher vector ci x

[Jegou, Douze, Schmid, Perez, CVPR’10]

slide-86
SLIDE 86

VLADs for corresponding images

SIFT-like representation per centroid (+ components: blue, - components: red)

good coincidence of energy & orientations v1 v2 v3 ...

slide-87
SLIDE 87

Translated cluster → large derivative on for this component

Fisher vector

Use a Gaussian Mixture Model as vocabulary

Statistical measure of the descriptors of the image w.r.t the GMM

Derivative of likelihood w.r.t. GMM parameters GMM parameters: weight mean variance (diagonal)

[Perronnin & Dance 07]

slide-88
SLIDE 88

Fisher vector

For image retrieval in our experiments:

  • only deviation wrt mean, dim: K*D [K number of Gaussians, D dim of descriptor]
  • variance does not improve for comparable vector length
slide-89
SLIDE 89

VLAD/Fisher/BOF performance and dimensionality reduction

We compare Fisher, VLAD and BoF on INRIA Holidays Dataset (mAP %)

Dimension is reduced to D’ dimensions with PCA

Observations:

Fisher, VLAD better than BoF for a given descriptor size

Choose a small D if output dimension D’ is small

Performance of GIST not competitive

[Jegou, Perronnin, Douze, Sanchez, Perez, Schmid, PAMI’12] GIST 960 36.5

slide-90
SLIDE 90

Compact image representation

Aim: improving the tradeoff between

search speed

memory usage

search quality

Approach: joint optimization of three stages

local descriptor aggregation

dimension reduction

indexing algorithm Image representation VLAD / Fisher PCA + PQ codes (Non) – exhaustive search

slide-91
SLIDE 91

Vector split into m subvectors:

Subvectors are quantized separately by quantizers where each is learned by k-means with a limited number of centroids

Example: y = 128-dim vector split in 8 subvectors of dimension 16

each subvector is quantized with 256 centroids -> 8 bit

very large codebook 256^8 ~ 1.8x10^19

Product quantization for nearest neighbor search

8 bits 16 components

⇒ 8 subvectors x 8 bits = 64-bit quantization index

y1 y2 y3 y4 y5 y6 y7 y8 q1 q2 q3 q4 q5 q6 q7 q8 q1(y1) q2(y2) q3(y3) q4(y4) q5(y5) q6(y6) q7(y7) q8(y8)

256 centroids

[Jegou, Douze, Schmid, PAMI’11]

slide-92
SLIDE 92

Deep image retrieval [Gordo et al. 2016]

  • Learns to represent images for retrieval

– Deep network which focuses on retrieval

  • Requires train data

– Introduces an automatic cleaning procedure based on geometric constraints

  • State-of-the-art results
  • Details in student presentation