visual instance retrieval
play

Visual Instance Retrieval Praveen Krishnan CVIT, IIIT Hyderabad - PowerPoint PPT Presentation

Visual Instance Retrieval Praveen Krishnan CVIT, IIIT Hyderabad June 15, 2017 1 Outline Image Retrieval Instance Level Search Deep Image Retrieval Neural Codes for Image Retrieval Local Convolutional Features Multi-Scale Orderless Pooling


  1. Visual Instance Retrieval Praveen Krishnan CVIT, IIIT Hyderabad June 15, 2017 1

  2. Outline Image Retrieval Instance Level Search Deep Image Retrieval Neural Codes for Image Retrieval Local Convolutional Features Multi-Scale Orderless Pooling Sum Pooled Convolutional Features Integral Max Pooling Case Study Gordo et. al. ECCV’16 2

  3. Image Retrieval Image retrieval problem Given a query object, retrieve all candidate objects from the database which matches the query irrespective of view point changes, illumination, scale and location. 3

  4. Instance Level Search Visual Search J. Sivic 4

  5. Instance Level Search Search photos on the web for particular places J. Sivic 5

  6. Instance Level Search Retrieval Challenges J. Sivic 6

  7. Instance Level Search Problem How to learn class agnostic compact and efficient image representation which is robust to retrieval challenges? 7

  8. Instance Level Search Problem How to learn class agnostic compact and efficient image representation which is robust to retrieval challenges? Solution Local feature aggregation of learned neural codes. ◮ Inspired from BoVW based encoding and pooling schemes. 7

  9. Neural Codes for Image Retrieval Neural Codes Use of feature activation from the top layers of CNN network as high level descriptor. Babenko et. al. ECCV’14 8

  10. Neural Codes for Image Retrieval Neural Codes ◮ Using pretrained networks on ILSVRC. ◮ Fine tuning on related dataset. Compressed neural codes ◮ PCA compression ◮ Discriminative dimensonality reduction ◮ Metric Learning: Learning of a low-rank projection matrix W . ◮ Training Data: Build matching graph by using standard image pipeline such as SIFT+NN Matching+RANSAC. Babenko et. al. ECCV’14 9

  11. Neural Codes for Image Retrieval Results Babenko et. al. ECCV’14 10

  12. Local Convolutional Features ◮ Activations from convolutional layers interpreted as local feature codes . ◮ Pooling of local features to produce compact global descriptors. E.g. VLAD, Fisher Vectors etc. ◮ More discriminative and less false positives. We will now see different ways to pool such codes for a global representation. 11

  13. Multi-Scale Orderless Pooling : MOP-CNN Building an orderless representation on top of CNN ( globally ordered ) activation in a multi-scale manner. Figure 1: Classification of CNN activations of local patches in an image. Notice the sensitivity of prediction w.r.t patches. Gong et. al. ECCV’14 12

  14. Multi-Scale Orderless Pooling : MOP-CNN Gong et. al. ECCV’14 13

  15. Sum Pooled Convolutional Features : SPoC SPoC Design 1. Sum Pooling with centering Prior: H W � � ψ 1 ( I ) = α ( x , y ) f ( x , y ) y =1 x =1 Here α denotes Gaussian weights dependent on the spatial co-ordinates. 2. Post Processing: PCA+Whitening ψ 2 ( I ) = diag ( s 1 , . . . , s N ) − 1 M PCA ψ 1 ( I ) ψ 2 ( I ) ψ SPoC ( I ) = || ψ 2 ( I ) || 2 Here M PCA is the PCA matrix and s i ’s are the singular values. Babenko et. al. CVPR’15 14

  16. Integral Max Pooling: R-MAC Revisiting traditional Bag of Visual Words:- ◮ Compact image representation derived from multiple image regions by global max-pooling. ◮ Approximating max pooling on integral images for efficient object localization. ◮ Performing image re-ranking and query expansion. Tolias et. al. ICLR’16 15

  17. Integral Max Pooling: R-MAC Maximum activations of convolutions (MAC) Given a set of 2 D convolutional feature channel responses X = {X i } , i = 1 . . . K , spatial max-pooling over all location is given as:- f ω = [ f Ω , 1 , . . . , f Ω , i , . . . , f Ω , K ] T , with f Ω , i = max p ∈ Ω X i ( p ) Here, Ω is the set of valid spatial locations, X i ( p ) is the response at particular position p , and K is the number of feature channels. Tolias et. al. ICLR’16 16

  18. Integral Max Pooling: R-MAC Regional maximum activation of convolutions (R-MAC) 1. Regional feature vector: f R over a rectangular region R ⊂ Ω = [1 , W ] × [1 , H ] is given as:- [ f R , 1 , . . . , f R , i , . . . , f R , K ] T , with f R , i = max p ∈R X i ( p ) 2. Sampling of regions: uniformly at l different scales. 3. Final descriptor: Individual R-MAC’s are l 2 normalized, PCA-Whitened and summed across all regions with l 2 normalization. Tolias et. al. ICLR’16 17

  19. Integral Max Pooling: R-MAC Object Localization ◮ Approximate integral max-pooling: Using generalized mean [Dollar et. al. 2009] 1   α ˜ � f R , i = X i ( p ) α  p ∈R where α > 1 and ˜ f i → f i when α → + ∞ ˜ f T R q ◮ Window detection: ˆ R = arg max R⊂ Ω � ˜ f R �� q � To reduce the search space of windows:- ◮ Efficient subwindow search (ESS) [Lampert et. al. 2009] ◮ Approximate max-pooling localization : Uses heuristics. Tolias et. al. ICLR’16 18

  20. Integral Max Pooling: R-MAC End2End Pipeline 1. Initial retrieval using R-MACs vectors. 2. Re-ranking by localization of query object in top-N ranked images. 3. Query expansion by merging the query vector with top-5 results. Tolias et. al. ICLR’16 19

  21. Takeaways till now. Takeaways ◮ Global image representation using pre-trained networks. ◮ Aggregation of local conv. activations from multiple regions better than FC layer activation. ◮ PCA compression, whitening and normalization plays an important role. Further Questions ◮ How to leverage deep architecture for the task of image retrieval? ◮ How to deal with non-uniform region and selecting pooling from them? 20

  22. Deep Image Retrieval: Gordo et. al. ECCV’16 CNN Architecture for Instance Retrieval ◮ A triplet network for optimizing the R-MAC [Tolias et. al. ICLR’15] representation. ◮ Uses a trained region proposal network to generate valid proposals. 21

  23. Deep Image Retrieval: Gordo et. al. ECCV’16 Detour: A quick overview on R-CNN, Fast R-CNN and Faster R-CNN. 22

  24. Deep Image Retrieval: Gordo et. al. ECCV’16 Leveraging large-scale noisy data ◮ Preparation of cleaned Landmark dataset. ◮ Generating pairwise scores between image pairs by building a matching graph. ◮ Pruning noises and extracting non-duplicate connected components. ◮ Leveraging bounding boxes from cleaned images. 23

  25. Deep Image Retrieval: Gordo et. al. ECCV’16 Bounding box estimation 1. Intialization: For each pair of connected components ( i , j ) and affine transformation matrix A ij , find the geometric median of matched keypoints. 2. Update: Run a diffusion process between a pair of bounding boxes B i and B j :- B ′ j = ( α − 1) B j + α A ij B i 24

  26. Deep Image Retrieval: Gordo et. al. ECCV’16 Qualitative Results 25

  27. Thank you 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend