[PPT] - Geometric VLAD for Large Scale Image Search Zixuan Wang 1 , Wei Di 2 PowerPoint Presentation

SLIDE 1

Geometric VLAD for Large Scale Image Search

Zixuan Wang1, Wei Di2, Anurag Bhardwaj2, Vignesh Jagadesh2, Robinson Piramuthu2

1 2

SLIDE 2

Our Goal

2 ICML 2014 workshop on New Learning Frameworks and Models for Big Data

1) Robust to various imaging conditions 2) Small memory footprint 3) Speed (<1s per query)

SLIDE 3

3

Issues with matching images (1/2)

Photometric Invariance

Brightness
Exposure

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 4

4

Issues with matching images (2/2)

Geometric Invariance

Rotation
Translation
Scale

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 5

5

State-of-the-art: Bag-of-Words (BoW)

Image Inventory Keypoint Detection Descriptor Computation Codebook Construction Bag-of-Words

BoW Computation BoW Encoding

Image Inventory Bag-of-Words Image Encoding (size = 200k) Inverted Indices

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

Slide evolved from Fei-Fei Li’s

SLIDE 6

v Weak Matching Schema

for a “small” visual dictionary: too many false matches
for a “large” visual dictionary: many true matches are missed

v Hard to find vocabulary size trade-offs v Large inverted index size

Issues with BOW Matching

6 ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 7

7

Recent approaches for very large scale indexing

Image Inventory Keypoint Detection Descriptor Computation Codebook Construction Bag-of-Words Image Inventory Bag-of-Words

Vector Encoding

Nearest Neighbor Search Vector Compression (size = 128)

BoW Computation

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 8

8 l

For a given image

►

assign each descriptor to closest center ci

►

accumulate (sum) descriptors per cell vi := vi + (x - ci)

l

Residual (x - ci) adds useful information

l

VLAD (dimension D = k x d) – typical k = 64

ci x

VLAD: Vector of Locally Aggregated Descriptors

128 Dimension VLAD has better performance than 65k BoW!

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 9

9

Issue with VLAD

VLAD: vi := vi + (x - ci)

VLAD fails to capture geometry information ci x

r

ci x

r VLAD: r + r + r + r = 4*r VLAD: r + r + r + r = 4*r

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 10

10

gVLAD: Incorporating Geometry in VLAD

gVLAD:

§

Take 2 angle bins - [-30,120), [120-330)

§

vi := vi + (x - ci) per angle bin

Angle binning captures different geometry of configurations! ci x

r

ci x

r gVLAD: (2*r, 2*r) Bin 1: r + r Bin 2: r + r gVLAD: (4*r, 0) Bin 1: r + r + r + r Bin 2: 0

Bin 1 Bin 2

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 11

11

Power of Keypoint Angle

Features mAP

Angle Bin (8) 0.15 Angle Bin (18) 0.24 Angle Bin (36) 0.26 Angle Bin (72) 0.27 GIST (544) 0.35 BoW (20,000) 0.45

Retrieval performance using only angle histogram

Only 72-D Angle Bin performs well!

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 12

12

q Large Scale Distractors Flickr 100K, Flickr 1M q Vocabulary k-means clustering on SURF descriptors with k = 256 on Paris dataset

Datasets & Vocabularies

Oxford 5K 5062 / 55 queries Paris 6K 6412 / 60 queries Holidays 1491 / 500 queries Rotated Holidays

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 13

13

Holiday example queries Oxford example queries

Dataset: Holiday & Oxford

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 14

14

Example Distractors – Flickr

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 15

15

gVLAD: Keypoint Detection & Descriptor Extraction

– feature descriptor – angle

SLIDE 16

0.81 0.815 0.82 0.825 0.83 0.835 0.84 0.845 0.85 0.855 20 40 60 80 100 mAP Offset

Rotate Holiday, 4 Bins

0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62 0.63 20 40 60 80 100 mAP Offset

Oxford, 4 Bins

gVLAD: Learning Angle Membership

16

SLIDE 17

gVLAD: Learning Angle Membership

17

Holiday SURF (8,233,763 key points)

Von-Mises Distribution

SLIDE 18

gVLAD: Vocabulary Adaptation

18

Adapt existing codebooks with incremental dataset
Alleviate the need of frequent large-scale codebook training

𝑉 ¡- Initial data set 𝐸 ¡– New data set 𝑣 adapted codebook 𝑣 initial codebook initial codebook k1 k2 k3

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 19

gVLAD: Compute Descriptors

19

Inter-norm

Rotated Holiday Dataset

~17.7%

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 20

gVLAD: PCA whitening

20

Whitened gVLAD:

Rotated Holiday Dataset

16.6% Lower Dim

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 21

gVLAD: PCA whitening

21

Dimension reduction on the original gVLAD using PCA From 65,536 à 128 dimensions, the mAP decreases only about 1%.

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 22

Experiment: Full size gVLAD on Holiday & Oxford

22

Full size gVLAD descriptors
Compared with state-of-the-art results
SURF detector & SURF descriptor are used
Best performances are in bold

16.6% 7.1% mAP performance

2003 2010 2013

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 23

Experiment: Low-dim gVLAD on Holiday & Oxford

23

mAP performance 15.4% 15.2%

2003 2010 2012 2013

Low dimensional descriptors
‑ 𝑙↓𝑥 =128
Comparison with state-of-the-art
Best performances are in bold.

K=128

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 24

Experiment: on Large Scale Dataset

24

Large Scale Data with 100k/1M distractors
Comparison with state-of-the-art
Best performances are in bold.

Avg ~ 12.5% Avg ~ 16.3%

2008 2013

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 25

Take Home Message

25

2013 2003 Bow 2010 VLAD Improved Fisher VLAD+SSR 2012 MutltiVoc+VLAD Ours - gVLAD

0.4 0.5 0.6 0.7 0.8 1 2 3 4 5 6 7 8 9 10 11

gVLAD

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 26

26

Thank You

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

SLIDE 27

BACKUP

SLIDE 28

28 ICML 2014 workshop on New Learning Frameworks and Models for Big Data

Speed and Memory

¨ 0.5KB per image for 128-D features. ¨ 0.5GB for 1M images. ¨ 500GB for 1B images.

Speed Memory ~750ms per query

SLIDE 29

29 ICML 2014 workshop on New Learning Frameworks and Models for Big Data

Comparison with CNN-Based Approaches

“Neural Codes for Image Retrieval”, A Babenko1, A. Slesarev, A. Chigorin, and V. Lempitsky, arXiv, April 2014. “Multi-scale Orderless Pooling of Deep Convolutional Activation Features”, Yunchao Gong1, Liwei Wang2, Ruiqi Guo2, and Svetlana Lazebnik, arXiv, March 2014.

gVLAD Neural Codes MOP-CNN