Presented by: Denis Efremov Source: https://en.ppt-online.org/92412 - - PowerPoint PPT Presentation

presented by denis efremov
SMART_READER_LITE
LIVE PREVIEW

Presented by: Denis Efremov Source: https://en.ppt-online.org/92412 - - PowerPoint PPT Presentation

The Inverted Multi-Index Presented by: Denis Efremov Source: https://en.ppt-online.org/92412 1/26 Introduction Main goal: apply NN search on the high-dimensional space NN is expensive curse of dimensionality Can pay by


slide-1
SLIDE 1

1/26

The Inverted Multi-Index

Source: https://en.ppt-online.org/92412

Presented by: Denis Efremov

slide-2
SLIDE 2

2/26

Introduction

  • Main goal: apply NN search on the high-dimensional space
  • NN is expensive – curse of dimensionality
  • Can pay by accuracy for the search time and memory usage
  • Use indexing
  • Indexing – storing and organizing the content of N-dimensional

space into K clusters

slide-3
SLIDE 3

3/26 3/49

Vector quantization

quantizer centroids codebook

  • Used in inverted index

for indexing

K = 16

  • K-means clustering of the

dataset Length of the cell lists is balanced Coarse sampling density

+

slide-4
SLIDE 4

4/26

Querying the inverted index

  • Have to consider

several words for best accuracy

  • Want to use as big

codebook as possible

  • Want to spend as little

time as possible for matching to codebooks

conflict

Query:

slide-5
SLIDE 5

5/26

Product quantization

  • Used in inverted multi-index for indexing
  • Used then for reranking in a both cases (Indexing and Multi-Indexing)

For the same K, much finer subdivision achieved Very non-uniform entry size distribution

+

  • K = 162
slide-6
SLIDE 6

6/26

Querying the inverted multi-index – Step 1

inverted index inverted multi-index number of entries

K K2

  • perations to

match to codebooks

2K+O(1) 2K+O(1)

slide-7
SLIDE 7

7/26

Querying the inverted multi-index – Step 2

0.6 0.8 4.1 6.1 8.1 9.1 2.5 2.7 6 8 10 11 3.5 3.7 7 9 11 12 6.5 6.7 10 12 14 15 7.5 7.7 11 13 15 16

11.5 11.7 15 17 19 20

0.6 0.8 4.1 6.1 8.1 9.1 2.5 2.7 6 8 10 11 3.5 3.7 7 9 11 12 6.5 6.7 10 12 14 15 7.5 7.7 11 13 15 16

11.5 11.7 15 17 19 20

0.6 0.8 4.1 6.1 8.1 9.1 2.5 2.7 6 8 10 11 3.5 3.7 7 9 11 12 6.5 6.7 10 12 14 15 7.5 7.7 11 13 15 16

11.5 11.7 15 17 19 20

0.6 0.8 4.1 6.1 8.1 9.1 2.5 2.7 6 8 10 11 3.5 3.7 7 9 11 12 6.5 6.7 10 12 14 15 7.5 7.7 11 13 15 16

11.5 11.7 15 17 19 20

0.6 0.8 4.1 6.1 8.1 9.1 2.5 2.7 6 8 10 11 3.5 3.7 7 9 11 12 6.5 6.7 10 12 14 15 7.5 7.7 11 13 15 16

11.5 11.7 15 17 19 20

0.6 0.8 4.1 6.1 8.1 9.1 2.5 2.7 6 8 10 11 3.5 3.7 7 9 11 12 6.5 6.7 10 12 14 15 7.5 7.7 11 13 15 16

11.5 11.7 15 17 19 20

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6

Step 2: the multi-sequence algorithm

slide-8
SLIDE 8

8/26

Index vs Multi-index

slide-9
SLIDE 9

9/26

Performance comparison

Recall on the dataset of 1 billion of visual descriptors:

100x

Time increase: 1.4 msec -> 2.2 msec on a single core (with Basic Linear Algebra Subprograms (BLAS) instructions)

"How fast can we catch the nearest neighbor to the query?"

K = 214

slide-10
SLIDE 10

10/26

Performance comparison

Recall on the dataset of 1 billion 128D visual descriptors:

slide-11
SLIDE 11

11/26

Time complexity

For same K index gets a slight advantage because of BLAS instructions

slide-12
SLIDE 12

12/26 12/49

Why 2 halves?

Fourth-order is faster, but not so accurate

slide-13
SLIDE 13

13/26

Multi-Index + Reranking

  • use m bytes to encode the original vector using product quantization
  • use m bytes to encode the remainder between the original vector

and the centroid faster (efficient caching possible for distance computation) more accurate

  • After quarrying we have list of

vectors without distances, to reorder the list we have to use reranking

Asymmetric Distance Computation

slide-14
SLIDE 14

14/26

Multi-D-ADC vs IVFADC

State-of-the-art [Jegou et al.]

slide-15
SLIDE 15

15/26

Retrieval examples

Exact NN Uncompressed GIST Multi-D-ADC 16 bytes Exact NN Uncompressed GIST Multi-D-ADC 16 bytes Exact NN Uncompressed GIST Multi-D-ADC 16 bytes Exact NN Uncompressed GIST Multi-D-ADC 16 bytes

slide-16
SLIDE 16

16/26

Multi-Index and PCA (128->32 dimensions)

  • Naïve – Principal ComponentAnalysis (PCA) before PQ
  • Smart – PQ before separated PCA
slide-17
SLIDE 17

17/26

Conclusions

  • A new data structure for indexing the visual descriptors
  • Significant accuracy boost over the inverted index at the cost of the small

memory overhead

  • Code available at https://github.com/ethz-

asl/maplab/tree/master/algorithms/loopclosure/inverted-multi-index

slide-18
SLIDE 18

18/26

Improvement of Product Quantization

  • K-means:

[Kalantidis, Avrithis CVPR 2014]

Minimal distortion Intractable look-up

+

  • Product Quantization:

Huge codebook Tractable Sensitive to projection (possible correlations)

+

slide-19
SLIDE 19

19/26

Improvement of Product Quantization

[Kalantidis, Avrithis CVPR 2014]

  • Optimized Product

Quantization: Huge codebook Tractable High-dim. Subspace Optimize w.r.t. R Unoptimized for local clusters (the same non-uniform distribution)

+

slide-20
SLIDE 20

20/26

Improvement of Product Quantization

[Kalantidis, Avrithis CVPR 2014]

  • Locally Optimized Product

Quantization: Huge codebook Tractable High-dim. Subspace Optimize w.r.t. R Locally optimized Are they?

+