Presented by: Denis Efremov Source: https://en.ppt-online.org/92412 - - PowerPoint PPT Presentation

▶

Oct 25, 2022 192 likes •393 views

The Inverted Multi-Index Presented by: Denis Efremov Source: https://en.ppt-online.org/92412 1/26 Introduction Main goal: apply NN search on the high-dimensional space NN is expensive curse of dimensionality Can pay by

SLIDE 1

1/26

The Inverted Multi-Index

Source: https://en.ppt-online.org/92412

Presented by: Denis Efremov

SLIDE 2

2/26

Introduction

Main goal: apply NN search on the high-dimensional space
NN is expensive – curse of dimensionality
Can pay by accuracy for the search time and memory usage
Use indexing
Indexing – storing and organizing the content of N-dimensional

space into K clusters

SLIDE 3

3/26 3/49

Vector quantization

quantizer centroids codebook

Used in inverted index

for indexing

K = 16

K-means clustering of the

dataset Length of the cell lists is balanced Coarse sampling density

+

SLIDE 4

4/26

Querying the inverted index

Have to consider

several words for best accuracy

Want to use as big

codebook as possible

Want to spend as little

time as possible for matching to codebooks

conflict

Query:

SLIDE 5

5/26

Product quantization

Used in inverted multi-index for indexing
Used then for reranking in a both cases (Indexing and Multi-Indexing)

For the same K, much finer subdivision achieved Very non-uniform entry size distribution

+

K = 162

SLIDE 6

6/26

Querying the inverted multi-index – Step 1

inverted index inverted multi-index number of entries

K K2

perations to

match to codebooks

2K+O(1) 2K+O(1)

SLIDE 7

7/26

Querying the inverted multi-index – Step 2

0.6 0.8 4.1 6.1 8.1 9.1 2.5 2.7 6 8 10 11 3.5 3.7 7 9 11 12 6.5 6.7 10 12 14 15 7.5 7.7 11 13 15 16

11.5 11.7 15 17 19 20

0.6 0.8 4.1 6.1 8.1 9.1 2.5 2.7 6 8 10 11 3.5 3.7 7 9 11 12 6.5 6.7 10 12 14 15 7.5 7.7 11 13 15 16

11.5 11.7 15 17 19 20

0.6 0.8 4.1 6.1 8.1 9.1 2.5 2.7 6 8 10 11 3.5 3.7 7 9 11 12 6.5 6.7 10 12 14 15 7.5 7.7 11 13 15 16

11.5 11.7 15 17 19 20

0.6 0.8 4.1 6.1 8.1 9.1 2.5 2.7 6 8 10 11 3.5 3.7 7 9 11 12 6.5 6.7 10 12 14 15 7.5 7.7 11 13 15 16

11.5 11.7 15 17 19 20

0.6 0.8 4.1 6.1 8.1 9.1 2.5 2.7 6 8 10 11 3.5 3.7 7 9 11 12 6.5 6.7 10 12 14 15 7.5 7.7 11 13 15 16

11.5 11.7 15 17 19 20

0.6 0.8 4.1 6.1 8.1 9.1 2.5 2.7 6 8 10 11 3.5 3.7 7 9 11 12 6.5 6.7 10 12 14 15 7.5 7.7 11 13 15 16

11.5 11.7 15 17 19 20

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6

Step 2: the multi-sequence algorithm

SLIDE 8

8/26

Index vs Multi-index

SLIDE 9

9/26

Performance comparison

Recall on the dataset of 1 billion of visual descriptors:

100x

Time increase: 1.4 msec -> 2.2 msec on a single core (with Basic Linear Algebra Subprograms (BLAS) instructions)

"How fast can we catch the nearest neighbor to the query?"

K = 214

SLIDE 10

10/26

Performance comparison

Recall on the dataset of 1 billion 128D visual descriptors:

SLIDE 11

11/26

Time complexity

For same K index gets a slight advantage because of BLAS instructions

SLIDE 12

12/26 12/49

Why 2 halves?

Fourth-order is faster, but not so accurate

SLIDE 13

13/26

Multi-Index + Reranking

use m bytes to encode the original vector using product quantization
use m bytes to encode the remainder between the original vector

and the centroid faster (efficient caching possible for distance computation) more accurate

After quarrying we have list of

vectors without distances, to reorder the list we have to use reranking

Asymmetric Distance Computation

SLIDE 14

14/26

Multi-D-ADC vs IVFADC

State-of-the-art [Jegou et al.]

SLIDE 15

15/26

Retrieval examples

Exact NN Uncompressed GIST Multi-D-ADC 16 bytes Exact NN Uncompressed GIST Multi-D-ADC 16 bytes Exact NN Uncompressed GIST Multi-D-ADC 16 bytes Exact NN Uncompressed GIST Multi-D-ADC 16 bytes

SLIDE 16

16/26

Multi-Index and PCA (128->32 dimensions)

Naïve – Principal ComponentAnalysis (PCA) before PQ
Smart – PQ before separated PCA

SLIDE 17

17/26

Conclusions

A new data structure for indexing the visual descriptors
Significant accuracy boost over the inverted index at the cost of the small

memory overhead

Code available at https://github.com/ethz-

asl/maplab/tree/master/algorithms/loopclosure/inverted-multi-index

SLIDE 18

18/26

Improvement of Product Quantization

K-means:

[Kalantidis, Avrithis CVPR 2014]

Minimal distortion Intractable look-up

+

Product Quantization:

Huge codebook Tractable Sensitive to projection (possible correlations)

+

SLIDE 19

19/26

Improvement of Product Quantization

[Kalantidis, Avrithis CVPR 2014]

Optimized Product

Quantization: Huge codebook Tractable High-dim. Subspace Optimize w.r.t. R Unoptimized for local clusters (the same non-uniform distribution)

+

SLIDE 20

20/26

Improvement of Product Quantization

[Kalantidis, Avrithis CVPR 2014]

Locally Optimized Product