Geometric VLAD for Large Scale Image Search
Zixuan Wang1, Wei Di2, Anurag Bhardwaj2, Vignesh Jagadesh2, Robinson Piramuthu2
1 2
Geometric VLAD for Large Scale Image Search Zixuan Wang 1 , Wei Di 2 - - PowerPoint PPT Presentation
Geometric VLAD for Large Scale Image Search Zixuan Wang 1 , Wei Di 2 , Anurag Bhardwaj 2 , Vignesh Jagadesh 2 , Robinson Piramuthu 2 1 2 Our Goal 1) Robust to various imaging conditions 2) Small memory footprint 3) Speed (<1s per query) 2
Zixuan Wang1, Wei Di2, Anurag Bhardwaj2, Vignesh Jagadesh2, Robinson Piramuthu2
1 2
2 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
1) Robust to various imaging conditions 2) Small memory footprint 3) Speed (<1s per query)
3
Photometric Invariance
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
4
Geometric Invariance
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
5
Image Inventory Keypoint Detection Descriptor Computation Codebook Construction Bag-of-Words
BoW Computation BoW Encoding
Image Inventory Bag-of-Words Image Encoding (size = 200k) Inverted Indices
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
Slide evolved from Fei-Fei Li’s
v Weak Matching Schema
v Hard to find vocabulary size trade-offs v Large inverted index size
6 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
7
Image Inventory Keypoint Detection Descriptor Computation Codebook Construction Bag-of-Words Image Inventory Bag-of-Words
Vector Encoding
Nearest Neighbor Search Vector Compression (size = 128)
BoW Computation
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
8 l
For a given image
►
assign each descriptor to closest center ci
►
accumulate (sum) descriptors per cell vi := vi + (x - ci)
l
Residual (x - ci) adds useful information
l
VLAD (dimension D = k x d) – typical k = 64
ci x
128 Dimension VLAD has better performance than 65k BoW!
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
9
VLAD: vi := vi + (x - ci)
VLAD fails to capture geometry information ci x
r
ci x
r VLAD: r + r + r + r = 4*r VLAD: r + r + r + r = 4*r
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
10
gVLAD:
§
Take 2 angle bins - [-30,120), [120-330)
§
vi := vi + (x - ci) per angle bin
Angle binning captures different geometry of configurations! ci x
r
ci x
r gVLAD: (2*r, 2*r) Bin 1: r + r Bin 2: r + r gVLAD: (4*r, 0) Bin 1: r + r + r + r Bin 2: 0
Bin 1 Bin 2
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
11
Features mAP
Angle Bin (8) 0.15 Angle Bin (18) 0.24 Angle Bin (36) 0.26 Angle Bin (72) 0.27 GIST (544) 0.35 BoW (20,000) 0.45
Retrieval performance using only angle histogram
Only 72-D Angle Bin performs well!
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
12
q Large Scale Distractors Flickr 100K, Flickr 1M q Vocabulary k-means clustering on SURF descriptors with k = 256 on Paris dataset
Oxford 5K 5062 / 55 queries Paris 6K 6412 / 60 queries Holidays 1491 / 500 queries Rotated Holidays
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
13
Holiday example queries Oxford example queries
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
14
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
15
– feature descriptor – angle
0.81 0.815 0.82 0.825 0.83 0.835 0.84 0.845 0.85 0.855 20 40 60 80 100 mAP Offset
Rotate Holiday, 4 Bins
0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62 0.63 20 40 60 80 100 mAP Offset
Oxford, 4 Bins
16
17
Holiday SURF (8,233,763 key points)
Von-Mises Distribution
18
𝑉 ¡- Initial data set 𝐸 ¡– New data set 𝑣 adapted codebook 𝑣 initial codebook initial codebook k1 k2 k3
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
19
Inter-norm
Rotated Holiday Dataset
~17.7%
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
20
Whitened gVLAD:
Rotated Holiday Dataset
16.6% Lower Dim
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
21
Dimension reduction on the original gVLAD using PCA From 65,536 à 128 dimensions, the mAP decreases only about 1%.
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
22
16.6% 7.1% mAP performance
2003 2010 2013
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
23
mAP performance 15.4% 15.2%
2003 2010 2012 2013
K=128
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
24
Avg ~ 12.5% Avg ~ 16.3%
2008 2013
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
25
2013 2003 Bow 2010 VLAD Improved Fisher VLAD+SSR 2012 MutltiVoc+VLAD Ours - gVLAD
0.4 0.5 0.6 0.7 0.8 1 2 3 4 5 6 7 8 9 10 11
gVLAD
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
26
ICML 2014 workshop on New Learning Frameworks and Models for Big Data
28 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
¨ 0.5KB per image for 128-D features. ¨ 0.5GB for 1M images. ¨ 500GB for 1B images.
Speed Memory ~750ms per query
29 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
“Neural Codes for Image Retrieval”, A Babenko1, A. Slesarev, A. Chigorin, and V. Lempitsky, arXiv, April 2014. “Multi-scale Orderless Pooling of Deep Convolutional Activation Features”, Yunchao Gong1, Liwei Wang2, Ruiqi Guo2, and Svetlana Lazebnik, arXiv, March 2014.
gVLAD Neural Codes MOP-CNN