neural codes for image retrieval
play

Neural Codes for Image Retrieval David Stutz July 22, 2015 David - PowerPoint PPT Presentation

Neural Codes for Image Retrieval David Stutz July 22, 2015 David Stutz | July 22, 2015 David Stutz | July 22, 2015 0/48 1/48 Table of Contents Introduction 1 Image Retrieval 2 Bag of Visual Words Vector of Locally Aggregated Descriptors


  1. Neural Codes for Image Retrieval David Stutz July 22, 2015 David Stutz | July 22, 2015 David Stutz | July 22, 2015 0/48 1/48

  2. Table of Contents Introduction 1 Image Retrieval 2 Bag of Visual Words Vector of Locally Aggregated Descriptors Sparse-Coded Features Compression and Nearest-Neighbor Search 3 Convolutional Neural Networks Multi-layer Perceptrons Convolutional Neural Networks Architectures Training 4 Neural Codes for Image Retrieval 5 Experiments 6 Summary David Stutz | July 22, 2015 2/48

  3. Table of Contents Introduction 1 Image Retrieval 2 Bag of Visual Words Vector of Locally Aggregated Descriptors Sparse-Coded Features Compression and Nearest-Neighbor Search 3 Convolutional Neural Networks Multi-layer Perceptrons Convolutional Neural Networks Architectures Training 4 Neural Codes for Image Retrieval 5 Experiments 6 Summary David Stutz | July 22, 2015 3/48

  4. 1. Introduction Image retrieval: Problem. Given a large database of images and a query image, find images showing the same object or scene. advantage: supports activities, emotions, ... Originally: ◮ Text-based retrieval systems based on manual annotations; ◮ unpractical for large collections of images. Today, content-based image retrieval: ◮ Techniques based on the Bag of Visual Words [SZ03] model. David Stutz | July 22, 2015 4/48

  5. 1. Introduction Image retrieval: Problem. Given a large database of images and a query image, find images showing the same object or scene. advantage: supports activities, emotions, ... Originally: ◮ Text-based retrieval systems based on manual annotations; ◮ unpractical for large collections of images. Today, content-based image retrieval: ◮ Techniques based on the Bag of Visual Words [SZ03] model. David Stutz | July 22, 2015 4/48

  6. 1. Introduction Image retrieval: Problem. Given a large database of images and a query image, find images showing the same object or scene. advantage: supports activities, emotions, ... Originally: ◮ Text-based retrieval systems based on manual annotations; ◮ unpractical for large collections of images. Today, content-based image retrieval: ◮ Techniques based on the Bag of Visual Words [SZ03] model. David Stutz | July 22, 2015 4/48

  7. Table of Contents Introduction 1 Image Retrieval 2 Bag of Visual Words Vector of Locally Aggregated Descriptors Sparse-Coded Features Compression and Nearest-Neighbor Search 3 Convolutional Neural Networks Multi-layer Perceptrons Convolutional Neural Networks Architectures Training 4 Neural Codes for Image Retrieval 5 Experiments 6 Summary David Stutz | July 22, 2015 5/48

  8. 2. Image Retrieval Formalization of content-based image retrieval: Problem. Find K -nearest-neighbors of query z 0 in a (large) database X = { x 1 , . . . , x N } of image representations. • • • • • K = 2 , N = 7 z 0 • • • David Stutz | July 22, 2015 6/48

  9. 2. Image Retrieval Formalization of content-based image retrieval: Problem. Find K -nearest-neighbors of query z 0 in a (large) database X = { x 1 , . . . , x N } of image representations. • • • • • • • • • • • K = 2 , N large • • z 0 • • • • • • • • • important: image representation David Stutz | July 22, 2015 6/48

  10. 2. Image Retrieval Formalization of content-based image retrieval: Problem. Find K -nearest-neighbors of query z 0 in a (large) database X = { x 1 , . . . , x N } of image representations. • • • • • • • • • • • K = 2 , N large • • z 0 • • • • • • • • • important: image representation Examples for image representations from the “Computer Vision” lecture: ◮ Histograms; ◮ Bag of Visual Words [SZ03]. David Stutz | July 22, 2015 6/48

  11. 2.1. Bag of Visual Words Intuition: assign local descriptors y l,n of image x n to visual words ˆ y 1 , . . . , ˆ y M previously obtained using clustering. y l,n ˆ y m David Stutz | July 22, 2015 7/48

  12. 2.1. Bag of Visual Words 1. Extract local descriptors Y n for each image x n . 2. Cluster all local descriptors Y = � N n =1 Y n to obtain visual words ˆ Y = { ˆ y 1 , . . . , ˆ y M } . 3. Assign each y l,n ∈ Y n to nearest visual word (embedding step): � � f ( y l,n ) = δ ( NN ˆ Y ( y l,n ) = ˆ y 1 ) , . . . . 4. Count visual word occurrences (aggregation step): L � F ( Y n ) = f ( y l,n ) . l =1 David Stutz | July 22, 2015 8/48

  13. 2.1. Bag of Visual Words 1. Extract local descriptors Y n for each image x n . 2. Cluster all local descriptors Y = � N n =1 Y n to obtain visual words ˆ Y = { ˆ y 1 , . . . , ˆ y M } . 3. Assign each y l,n ∈ Y n to nearest visual word (embedding step): � � f ( y l,n ) = δ ( NN ˆ Y ( y l,n ) = ˆ y 1 ) , . . . . 4. Count visual word occurrences (aggregation step): L � F ( Y n ) = f ( y l,n ) . l =1 David Stutz | July 22, 2015 8/48

  14. 2.1. Bag of Visual Words 1. Extract local descriptors Y n for each image x n . 2. Cluster all local descriptors Y = � N n =1 Y n to obtain visual words ˆ Y = { ˆ y 1 , . . . , ˆ y M } . 3. Assign each y l,n ∈ Y n to nearest visual word (embedding step): � � f ( y l,n ) = δ ( NN ˆ Y ( y l,n ) = ˆ y 1 ) , . . . . 4. Count visual word occurrences (aggregation step): L � F ( Y n ) = f ( y l,n ) . l =1 David Stutz | July 22, 2015 8/48

  15. 2.2. Vector of Locally Aggregated Descriptors Intuition: consider the residuals y l,n − ˆ y m instead of counting visual words. y l,n y m − y l,n ˆ ˆ y m David Stutz | July 22, 2015 9/48

  16. 2.2. Vector of Locally Aggregated Descriptors 1. Extract and cluster local descriptors. 2. Compute residuals of local descriptors visual words (embedding step): � � f ( y l,n ) = δ ( NN ˆ Y ( y l,n ) = ˆ y 1 )( y l,n − ˆ y 1 ) , . . . . 3. Aggregate residuals (aggregation step): L � F ( Y n ) = f ( y l,n ) . l =1 4. L 2 -normalize F ( Y n ) . David Stutz | July 22, 2015 10/48

  17. 2.2. Vector of Locally Aggregated Descriptors 1. Extract and cluster local descriptors. 2. Compute residuals of local descriptors visual words (embedding step): � � f ( y l,n ) = δ ( NN ˆ Y ( y l,n ) = ˆ y 1 )( y l,n − ˆ y 1 ) , . . . . 3. Aggregate residuals (aggregation step): L � F ( Y n ) = f ( y l,n ) . l =1 4. L 2 -normalize F ( Y n ) . David Stutz | July 22, 2015 10/48

  18. 2.2. Vector of Locally Aggregated Descriptors 1. Extract and cluster local descriptors. 2. Compute residuals of local descriptors visual words (embedding step): � � f ( y l,n ) = δ ( NN ˆ Y ( y l,n ) = ˆ y 1 )( y l,n − ˆ y 1 ) , . . . . 3. Aggregate residuals (aggregation step): L � F ( Y n ) = f ( y l,n ) . l =1 4. L 2 -normalize F ( Y n ) . David Stutz | July 22, 2015 10/48

  19. 2.3. Sparse-Coded Features Intuition: soft-assign local descriptors to visual words. y l,n y m ′ ˆ ˆ y m David Stutz | July 22, 2015 11/48

  20. 2.3. Sparse-Coded Features 1. Extract and cluster local descriptors. 2. Compute sparse codes (embedding step): � y l,n − ˆ Y r l � 2 f ( y l,n ) = argmin 2 + λ � r l � 1 . r l contains ˆ y m as columns 3. Pool sparse codes (aggregation step): � � F ( Y n ) = 1 ≤ l ≤ L { f 1 ( y l,n ) } , . . . max first component of f ( y l,n ) David Stutz | July 22, 2015 12/48

  21. 2.3. Sparse-Coded Features 1. Extract and cluster local descriptors. 2. Compute sparse codes (embedding step): � y l,n − ˆ Y r l � 2 f ( y l,n ) = argmin 2 + λ � r l � 1 . r l contains ˆ y m as columns 3. Pool sparse codes (aggregation step): � � F ( Y n ) = 1 ≤ l ≤ L { f 1 ( y l,n ) } , . . . max first component of f ( y l,n ) David Stutz | July 22, 2015 12/48

  22. 2.3. Sparse-Coded Features 1. Extract and cluster local descriptors. 2. Compute sparse codes (embedding step): � y l,n − ˆ Y r l � 2 f ( y l,n ) = argmin 2 + λ � r l � 1 . r l contains ˆ y m as columns 3. Pool sparse codes (aggregation step): � � F ( Y n ) = 1 ≤ l ≤ L { f 1 ( y l,n ) } , . . . max first component of f ( y l,n ) David Stutz | July 22, 2015 12/48

  23. 2.4. Compression, Nearest-Neighbor Search Until now: image representation. Additional aspects of image retrieval: ◮ compression of image representations; ◮ efficient indexing and nearest-neighbor search [JDS11]; ◮ query expansion [CPS + 07] and spatial verification [PCI + 07]. For example, compression can be accomplished using: ◮ Unsupervised methods, e.g. Principal Component Analysis (PCA); ◮ or discriminate methods, e.g. Joint Subspace and Classifier Learning [GRPV12] or Large Margin Dimensionality Reduction [SPVZ13]. discussed later ... David Stutz | July 22, 2015 13/48

  24. 2.4. Compression, Nearest-Neighbor Search Until now: image representation. Additional aspects of image retrieval: ◮ compression of image representations; ◮ efficient indexing and nearest-neighbor search [JDS11]; ◮ query expansion [CPS + 07] and spatial verification [PCI + 07]. For example, compression can be accomplished using: ◮ Unsupervised methods, e.g. Principal Component Analysis (PCA); ◮ or discriminate methods, e.g. Joint Subspace and Classifier Learning [GRPV12] or Large Margin Dimensionality Reduction [SPVZ13]. discussed later ... David Stutz | July 22, 2015 13/48

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend