Geometric VLAD for Large Scale Image Search Zixuan Wang 1 , Wei Di 2 - - PowerPoint PPT Presentation

geometric vlad for large scale image search
SMART_READER_LITE
LIVE PREVIEW

Geometric VLAD for Large Scale Image Search Zixuan Wang 1 , Wei Di 2 - - PowerPoint PPT Presentation

Geometric VLAD for Large Scale Image Search Zixuan Wang 1 , Wei Di 2 , Anurag Bhardwaj 2 , Vignesh Jagadesh 2 , Robinson Piramuthu 2 1 2 Our Goal 1) Robust to various imaging conditions 2) Small memory footprint 3) Speed (<1s per query) 2


slide-1
SLIDE 1

Geometric VLAD for Large Scale Image Search

Zixuan Wang1, Wei Di2, Anurag Bhardwaj2, Vignesh Jagadesh2, Robinson Piramuthu2

1 2

slide-2
SLIDE 2

Our Goal

2 ICML 2014 workshop on New Learning Frameworks and Models for Big Data

1) Robust to various imaging conditions 2) Small memory footprint 3) Speed (<1s per query)

slide-3
SLIDE 3

3

Issues with matching images (1/2)

Photometric Invariance

  • Brightness
  • Exposure

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-4
SLIDE 4

4

Issues with matching images (2/2)

Geometric Invariance

  • Rotation
  • Translation
  • Scale

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-5
SLIDE 5

5

State-of-the-art: Bag-of-Words (BoW)

Image Inventory Keypoint Detection Descriptor Computation Codebook Construction Bag-of-Words

BoW Computation BoW Encoding

Image Inventory Bag-of-Words Image Encoding (size = 200k) Inverted Indices

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

Slide evolved from Fei-Fei Li’s

slide-6
SLIDE 6

v Weak Matching Schema

  • for a “small” visual dictionary: too many false matches
  • for a “large” visual dictionary: many true matches are missed

v Hard to find vocabulary size trade-offs v Large inverted index size

Issues with BOW Matching

6 ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-7
SLIDE 7

7

Recent approaches for very large scale indexing

Image Inventory Keypoint Detection Descriptor Computation Codebook Construction Bag-of-Words Image Inventory Bag-of-Words

Vector Encoding

Nearest Neighbor Search Vector Compression (size = 128)

BoW Computation

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-8
SLIDE 8

8 l

For a given image

assign each descriptor to closest center ci

accumulate (sum) descriptors per cell vi := vi + (x - ci)

l

Residual (x - ci) adds useful information

l

VLAD (dimension D = k x d) – typical k = 64

ci x

VLAD: Vector of Locally Aggregated Descriptors

128 Dimension VLAD has better performance than 65k BoW!

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-9
SLIDE 9

9

Issue with VLAD

VLAD: vi := vi + (x - ci)

VLAD fails to capture geometry information ci x

r

ci x

r VLAD: r + r + r + r = 4*r VLAD: r + r + r + r = 4*r

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-10
SLIDE 10

10

gVLAD: Incorporating Geometry in VLAD

gVLAD:

§

Take 2 angle bins - [-30,120), [120-330)

§

vi := vi + (x - ci) per angle bin

Angle binning captures different geometry of configurations! ci x

r

ci x

r gVLAD: (2*r, 2*r) Bin 1: r + r Bin 2: r + r gVLAD: (4*r, 0) Bin 1: r + r + r + r Bin 2: 0

Bin 1 Bin 2

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-11
SLIDE 11

11

Power of Keypoint Angle

Features mAP

Angle Bin (8) 0.15 Angle Bin (18) 0.24 Angle Bin (36) 0.26 Angle Bin (72) 0.27 GIST (544) 0.35 BoW (20,000) 0.45

Retrieval performance using only angle histogram

Only 72-D Angle Bin performs well!

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-12
SLIDE 12

12

q Large Scale Distractors Flickr 100K, Flickr 1M q Vocabulary k-means clustering on SURF descriptors with k = 256 on Paris dataset

Datasets & Vocabularies

Oxford 5K 5062 / 55 queries Paris 6K 6412 / 60 queries Holidays 1491 / 500 queries Rotated Holidays

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-13
SLIDE 13

13

Holiday example queries Oxford example queries

Dataset: Holiday & Oxford

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-14
SLIDE 14

14

Example Distractors – Flickr

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-15
SLIDE 15

15

gVLAD: Keypoint Detection & Descriptor Extraction

– feature descriptor – angle

slide-16
SLIDE 16

0.81 0.815 0.82 0.825 0.83 0.835 0.84 0.845 0.85 0.855 20 40 60 80 100 mAP Offset

Rotate Holiday, 4 Bins

0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62 0.63 20 40 60 80 100 mAP Offset

Oxford, 4 Bins

gVLAD: Learning Angle Membership

16

slide-17
SLIDE 17

gVLAD: Learning Angle Membership

17

Holiday SURF (8,233,763 key points)

Von-Mises Distribution

slide-18
SLIDE 18

gVLAD: Vocabulary Adaptation

18

  • Adapt existing codebooks with incremental dataset
  • Alleviate the need of frequent large-scale codebook training

𝑉 ¡- Initial data set 𝐸 ¡– New data set ​𝑣 adapted codebook 𝑣 initial codebook initial codebook k1 k2 k3

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-19
SLIDE 19

gVLAD: Compute Descriptors

19

Inter-norm

Rotated Holiday Dataset

~17.7%

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-20
SLIDE 20

gVLAD: PCA whitening

20

Whitened gVLAD:

Rotated Holiday Dataset

16.6% Lower Dim

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-21
SLIDE 21

gVLAD: PCA whitening

21

Dimension reduction on the original gVLAD using PCA From 65,536 à 128 dimensions, the mAP decreases only about 1%.

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-22
SLIDE 22

Experiment: Full size gVLAD on Holiday & Oxford

22

  • Full size gVLAD descriptors
  • Compared with state-of-the-art results
  • SURF detector & SURF descriptor are used
  • Best performances are in bold

16.6% 7.1% mAP performance

2003 2010 2013

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-23
SLIDE 23

Experiment: Low-dim gVLAD on Holiday & Oxford

23

mAP performance 15.4% 15.2%

2003 2010 2012 2013

  • Low dimensional descriptors
  • ­‑ ​𝑙↓𝑥 =128
  • Comparison with state-of-the-art
  • Best performances are in bold.

K=128

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-24
SLIDE 24

Experiment: on Large Scale Dataset

24

  • Large Scale Data with 100k/1M distractors
  • Comparison with state-of-the-art
  • Best performances are in bold.

Avg ~ 12.5% Avg ~ 16.3%

2008 2013

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-25
SLIDE 25

Take Home Message

25

2013 2003 Bow 2010 VLAD Improved Fisher VLAD+SSR 2012 MutltiVoc+VLAD Ours - gVLAD

0.4 0.5 0.6 0.7 0.8 1 2 3 4 5 6 7 8 9 10 11

gVLAD

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-26
SLIDE 26

26

Thank You

ICML 2014 workshop on New Learning Frameworks and Models for Big Data

slide-27
SLIDE 27

BACKUP

slide-28
SLIDE 28

28 ICML 2014 workshop on New Learning Frameworks and Models for Big Data

Speed and Memory

¨ 0.5KB per image for 128-D features. ¨ 0.5GB for 1M images. ¨ 500GB for 1B images.

Speed Memory ~750ms per query

slide-29
SLIDE 29

29 ICML 2014 workshop on New Learning Frameworks and Models for Big Data

Comparison with CNN-Based Approaches

“Neural Codes for Image Retrieval”, A Babenko1, A. Slesarev, A. Chigorin, and V. Lempitsky, arXiv, April 2014. “Multi-scale Orderless Pooling of Deep Convolutional Activation Features”, Yunchao Gong1, Liwei Wang2, Ruiqi Guo2, and Svetlana Lazebnik, arXiv, March 2014.

gVLAD Neural Codes MOP-CNN