Visual Instance Retrieval Praveen Krishnan CVIT, IIIT Hyderabad - - PowerPoint PPT Presentation

visual instance retrieval
SMART_READER_LITE
LIVE PREVIEW

Visual Instance Retrieval Praveen Krishnan CVIT, IIIT Hyderabad - - PowerPoint PPT Presentation

Visual Instance Retrieval Praveen Krishnan CVIT, IIIT Hyderabad June 15, 2017 1 Outline Image Retrieval Instance Level Search Deep Image Retrieval Neural Codes for Image Retrieval Local Convolutional Features Multi-Scale Orderless Pooling


slide-1
SLIDE 1

Visual Instance Retrieval

Praveen Krishnan

CVIT, IIIT Hyderabad

June 15, 2017

1

slide-2
SLIDE 2

Outline

Image Retrieval Instance Level Search Deep Image Retrieval Neural Codes for Image Retrieval Local Convolutional Features

Multi-Scale Orderless Pooling Sum Pooled Convolutional Features Integral Max Pooling

Case Study Gordo et. al. ECCV’16

2

slide-3
SLIDE 3

Image Retrieval

Image retrieval problem

Given a query object, retrieve all candidate objects from the database which matches the query irrespective of view point changes, illumination, scale and location.

3

slide-4
SLIDE 4

Instance Level Search

Visual Search

  • J. Sivic

4

slide-5
SLIDE 5

Instance Level Search

Search photos on the web for particular places

  • J. Sivic

5

slide-6
SLIDE 6

Instance Level Search

Retrieval Challenges

  • J. Sivic

6

slide-7
SLIDE 7

Instance Level Search

Problem

How to learn class agnostic compact and efficient image representation which is robust to retrieval challenges?

7

slide-8
SLIDE 8

Instance Level Search

Problem

How to learn class agnostic compact and efficient image representation which is robust to retrieval challenges?

Solution

Local feature aggregation of learned neural codes.

◮ Inspired from BoVW based encoding and pooling schemes.

7

slide-9
SLIDE 9

Neural Codes for Image Retrieval

Neural Codes

Use of feature activation from the top layers of CNN network as high level descriptor.

Babenko et. al. ECCV’14

8

slide-10
SLIDE 10

Neural Codes for Image Retrieval

Neural Codes

◮ Using pretrained networks on ILSVRC. ◮ Fine tuning on related dataset.

Compressed neural codes

◮ PCA compression ◮ Discriminative dimensonality reduction

◮ Metric Learning: Learning of a low-rank projection matrix W . ◮ Training Data: Build matching graph by using standard image

pipeline such as SIFT+NN Matching+RANSAC.

Babenko et. al. ECCV’14

9

slide-11
SLIDE 11

Neural Codes for Image Retrieval

Results

Babenko et. al. ECCV’14

10

slide-12
SLIDE 12

Local Convolutional Features

◮ Activations from convolutional layers

interpreted as local feature codes.

◮ Pooling of local features to produce

compact global descriptors. E.g. VLAD, Fisher Vectors etc.

◮ More discriminative and less false

positives. We will now see different ways to pool such codes for a global representation.

11

slide-13
SLIDE 13

Multi-Scale Orderless Pooling : MOP-CNN

Building an orderless representation on top of CNN (globally

  • rdered) activation in a multi-scale manner.

Figure 1: Classification of CNN activations of local patches in an image. Notice the sensitivity of prediction w.r.t patches.

Gong et. al. ECCV’14

12

slide-14
SLIDE 14

Multi-Scale Orderless Pooling : MOP-CNN

Gong et. al. ECCV’14

13

slide-15
SLIDE 15

Sum Pooled Convolutional Features : SPoC

SPoC Design

  • 1. Sum Pooling with centering Prior:

ψ1(I) =

H

  • y=1

W

  • x=1

α(x, y)f (x, y) Here α denotes Gaussian weights dependent on the spatial co-ordinates.

  • 2. Post Processing: PCA+Whitening

ψ2(I) =diag(s1, . . . , sN)−1MPCAψ1(I) ψSPoC(I) = ψ2(I) ||ψ2(I)||2 Here MPCA is the PCA matrix and si’s are the singular values.

Babenko et. al. CVPR’15

14

slide-16
SLIDE 16

Integral Max Pooling: R-MAC

Revisiting traditional Bag of Visual Words:-

◮ Compact image representation derived

from multiple image regions by global max-pooling.

◮ Approximating max pooling on integral

images for efficient object localization.

◮ Performing image re-ranking and query

expansion.

Tolias et. al. ICLR’16

15

slide-17
SLIDE 17

Integral Max Pooling: R-MAC

Maximum activations of convolutions (MAC)

Given a set of 2D convolutional feature channel responses X = {Xi}, i = 1 . . . K, spatial max-pooling over all location is given as:- fω = [fΩ,1, . . . , fΩ,i, . . . , fΩ,K]T, with fΩ,i = max

p∈Ω Xi(p)

Here, Ω is the set of valid spatial locations, Xi(p) is the response at particular position p, and K is the number of feature channels.

Tolias et. al. ICLR’16

16

slide-18
SLIDE 18

Integral Max Pooling: R-MAC

Regional maximum activation of convolutions (R-MAC)

  • 1. Regional feature vector: fR over a rectangular region

R ⊂ Ω = [1, W ] × [1, H] is given as:- [fR,1, . . . , fR,i, . . . , fR,K]T, with fR,i = max

p∈R Xi(p)

  • 2. Sampling of regions: uniformly at l different scales.
  • 3. Final descriptor: Individual R-MAC’s are l2 normalized,

PCA-Whitened and summed across all regions with l2 normalization.

Tolias et. al. ICLR’16

17

slide-19
SLIDE 19

Integral Max Pooling: R-MAC

Object Localization

◮ Approximate integral max-pooling: Using generalized mean

[Dollar et. al. 2009] ˜ fR,i =  

p∈R

Xi(p)α  

1 α

where α > 1 and ˜ fi → fi when α → +∞

◮ Window detection: ˆ

R = arg maxR⊂Ω

˜ f T

Rq

˜ fRq

To reduce the search space of windows:-

◮ Efficient subwindow search (ESS) [Lampert et. al. 2009] ◮ Approximate max-pooling localization: Uses heuristics.

Tolias et. al. ICLR’16

18

slide-20
SLIDE 20

Integral Max Pooling: R-MAC

End2End Pipeline

  • 1. Initial retrieval using R-MACs vectors.
  • 2. Re-ranking by localization of query object in top-N ranked

images.

  • 3. Query expansion by merging the query vector with top-5

results.

Tolias et. al. ICLR’16

19

slide-21
SLIDE 21

Takeaways till now.

Takeaways

◮ Global image representation using pre-trained networks. ◮ Aggregation of local conv. activations from multiple regions

better than FC layer activation.

◮ PCA compression, whitening and normalization plays an

important role.

Further Questions

◮ How to leverage deep architecture for the task of image

retrieval?

◮ How to deal with non-uniform region and selecting pooling

from them?

20

slide-22
SLIDE 22

Deep Image Retrieval: Gordo et. al. ECCV’16

CNN Architecture for Instance Retrieval

◮ A triplet network for optimizing the R-MAC [Tolias et. al.

ICLR’15] representation.

◮ Uses a trained region proposal network to generate valid

proposals.

21

slide-23
SLIDE 23

Deep Image Retrieval: Gordo et. al. ECCV’16

Detour: A quick overview on R-CNN, Fast R-CNN and Faster R-CNN.

22

slide-24
SLIDE 24

Deep Image Retrieval: Gordo et. al. ECCV’16

Leveraging large-scale noisy data

◮ Preparation of cleaned Landmark dataset. ◮ Generating pairwise scores between image pairs by building a

matching graph.

◮ Pruning noises and extracting non-duplicate connected

components.

◮ Leveraging bounding boxes from cleaned images.

23

slide-25
SLIDE 25

Deep Image Retrieval: Gordo et. al. ECCV’16

Bounding box estimation

  • 1. Intialization: For each pair of connected components (i, j)

and affine transformation matrix Aij, find the geometric median of matched keypoints.

  • 2. Update: Run a diffusion process between a pair of bounding

boxes Bi and Bj:- B′

j = (α − 1)Bj + αAijBi

24

slide-26
SLIDE 26

Deep Image Retrieval: Gordo et. al. ECCV’16

Qualitative Results

25

slide-27
SLIDE 27

Thank you

26