Image Retrieval with CNN Giorgos Tolias Visual Recognition Group, - PowerPoint PPT Presentation

Visual Place Recognition as Image Retrieval with CNN Giorgos Tolias Visual Recognition Group, CTU in Prague CVPR 2017 tutorial on Large-Scale Visual Place Recognition and Image-Based Localization Alex Kendall, Torsten Sattler, Giorgos Tolias, Akihiko Torii

Visual place recognition

Visual place recognition by image retrieval Query (image) query descriptor Nearest Neighbor search Descriptors for database images

http://viral.image.ntua.gr

CNN as feature extractors • CNN pre-trained for image classification • Internal layer activations as features • Good generalization properties • Detection • Fine-grained classification • Scene classification • Semantic segmentation Figure from Razavian et al. • …… Donahue, J., Jia, Y., Vinyals , O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: DeCAF: A deep convolutional activation feature for generic visual recognition. In: arXiv:1310.1531. (2013). Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson , S.: CNN features off -the-shelf: An astounding baseline for recognition. In: CVPRW. (2014)

Image retrieval with pre-trained CNN

Global image representation – FC layer • Features: FC layer activations • Resize/crop to fixed image size Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: ECCV. (2014) Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson , S.: CNN features off -the-shelf: An astounding baseline for recognition. In: CVPRW. (2014) Figure from Babenko et al.

Global image representation – Conv layer • Features: Conv layer activations • Global max or sum pooling • Any input image size • Better to use last Conv layer (VGG, Alex) Azizpour, H., Razavian, A.S., Sullivan, J., Maki, A., Carlsson , S.: From generic to specific deep representations for visual recognition. In: CVPRW. (2015) Babenko, A., Lempitsky, V.: Aggregating deep convolutional features for image retrieval. In: ICCV. (2015) Figure from Razavian et al.

Spatial and channel weighting • Channel-wise and spatial-wise weighting • Global sum pooling • Channel-wise: IDF-like weighting • Spatial-wise: saliency mask by L2 norm Kalantidis, Y., Mellina, C., Osindero, S.: Cross-dimensional weighting for aggregated deep convolutional features. In: ECCVW (2016) Figures from Kalantidis et al.

Maximum Activations of Convolutions - MAC Input image conv 5 filter 1 conv 5 filter 2 …. conv 5 filter i …. conv 5 filter K Tolias, G., Sicre, R., Jegou, H.: Particular object retrieval with integral max pooling of CNN activations. In: ICLR. (2016)

Maximum Activations of Convolutions - MAC Input image conv 5 filter 1 conv 5 filter 2 …. conv 5 filter i …. conv 5 filter K maximum activation Tolias, G., Sicre, R., Jegou, H.: Particular object retrieval with integral max pooling of CNN activations. In: ICLR. (2016)

MAC similarity • Similarity: inner product of L 2 normalized MAC descriptors • Max of the same feature map fires on the same location • Implicitly forms correspondences (512 for VGG) Tolias, G., Sicre, R., Jegou, H.: Particular object retrieval with integral max pooling of CNN activations. In: ICLR. (2016)

Regional Maximum Activations of Convolutions R-MAC • Extract MAC descriptor per region • Sum pool regional descriptors • Global image representation (same dimensionality as MAC) • PCA Whitening MAC L2 norm Whitening L2 norm whitening Tolias, G., Sicre, R., Jegou, H.: Particular object retrieval with integral max pooling of CNN activations. In: ICLR. (2016)

Comparison with local feature based methods Method Oxf5k Oxf105k Par6k Par106k 77.3 84.9 79.5 82.4 BoW(16M) + geometry + QE 3-4k features / image Memory demanding 88.0 84.0 Hamming Query Expansion 82.8 - Triangulation Emb. 1024D 56.0 50.2 - - Compact representation One descriptor /image 83.0 75.7 R-MAC (512D) 66.9 61.6

Object localization Tolias, G., Sicre, R., Jegou, H.: Particular object retrieval with integral max pooling of CNN activations. In: ICLR. (2016)

Object localization with integral max pooling Tolias, G., Sicre, R., Jegou, H.: Particular object retrieval with integral max pooling of CNN activations. In: ICLR. (2016)

Object localization with integral max pooling Initial ranking (IR) Re-ranking (RR) IR  RR Initial ranking (IR) Re-ranking (RR) IR  RR Tolias, G., Sicre, R., Jegou, H.: Particular object retrieval with integral max pooling of CNN activations. In: ICLR. (2016)

Object localization with integral max pooling Initial ranking (IR) Re-ranking (RR) <3 seconds to re-rank 1000 images IR  RR using 1 CPU thread Initial ranking (IR) Re-ranking (RR) IR  RR Tolias, G., Sicre, R., Jegou, H.: Particular object retrieval with integral max pooling of CNN activations. In: ICLR. (2016)

Comparison with local features, geometry, and query expansion Method Oxf5k Oxf105k Par6k Par106k 84.9 79.5 82.4 77.3 BoW(16M) + geometry + QE 88.0 84.0 Hamming Query Expansion 82.8 - R-MAC (512D) 66.9 61.6 83.0 75.7 86.5 79.8 R-MAC + localization + QE 77.3 73.2

Other approaches Known encodings applied on CNN local descriptor • Bag-of-Words Mohedano E, Salvador A, McGuinness K, Giró-i-Nieto X, O'Connor N, Marqués F. Bags of Local Convolutional Features for Scalable Instance Search. In ICMR 2016 • Fisher vectors P. Kulkarni , J. Zepeda , F. Jurie , P. Perez and L. Chevallier, Hybrid multi-layer deep cnn/aggregator feature for image classification, In ICASSP 2015 • VLAD Y. Gong, L. Wang, R. Guo, and S. Lazebnik, Multi-scale Orderless Pooling of Deep Convolutional Activation Features, In ECCV 2014 Figure from Mohedano et al. Figure from Gong et al.

Off-the-shelf CNN • Target application: classification • Training dataset: ImageNet • Architecture: AlexNet, VGG, ResNet Images from ImageNet.org • Directly applicable to other tasks Fine-grain classification Object detection Image retrieval Images from ImageNet.org Images from PASCAL VOC 2012

CNN fine-tuning for image retrieval

Lots of Training Examples Training … Image annotations Large Internet Convolutional Neural photo collection Network (CNN)

Lots of Training Examples … Not accurate Expensive $$ Large Internet Convolutional Neural photo collection Network (CNN)

Lots of Training Examples Manual cleaning of the training data done by Researchers Very expensive $$$$ … Not accurate Expensive $$ Large Internet Convolutional Neural photo collection Network (CNN)

Lots of Training Examples Manual cleaning of the training data done by Researchers Very expensive $$$$ … Not accurate Expensive $$ Large Internet Convolutional Neural photo collection Network (CNN) Automated extraction of training data Accurate Free $

Annotations for CNN Image Retrieval CNN pre-trained for classification task used for retrieval [Gong et al. ECCV’14, Babenko et al. ICCV’15, Kalantidis et al. ECCVW’16, Tolias et al. ICLR’16 ] Building class

Annotations for CNN Image Retrieval CNN pre-trained for classification task used for retrieval [Gong et al. ECCV’14, Babenko et al. ICCV’15, Kalantidis et al. ECCVW’16, Tolias et al. ICLR’16 ] Building class Fine-tuned CNN using a dataset with landmark classes [Babenko et al. ECCV’14] Landmark class

Annotations for CNN Image Retrieval CNN pre-trained for classification task used for retrieval [Gong et al. ECCV’14, Babenko et al. ICCV’15, Kalantidis et al. ECCVW’16, Tolias et al. ICLR’16 ] Building class Fine-tuned CNN using a dataset with landmark classes [Babenko et al. ECCV’14] Landmark class NetVLAD: Weakly supervised spatially closest ≠ matching fine-tuned CNN using GPS tags [Arandjelovic et al. CVPR’16]

NetVLAD WxHxD feature map of last conv. layer Negatives: geographically far Positives: geographically close and close in the feature space Figures from Arandjelovic et al.

Annotations for CNN Image Retrieval CNN pre-trained for classification task used for retrieval [Gong et al. ECCV’14, Babenko et al. ICCV’15, Kalantidis et al. ECCVW’16, Tolias et al. ICLR’16] Building class Fine-tuned CNN using a dataset with landmark classes [Babenko et al. ECCV’14] Landmark class NetVLAD: Weakly supervised spatially closest ≠ matching fine-tuned CNN using GPS tags [Arandjelovic et al. CVPR’16]

Annotations for CNN Image Retrieval CNN pre-trained for classification task used for retrieval [Gong et al. ECCV’14, Babenko et al. ICCV’15, Kalantidis et al. ECCVW’16, Tolias et al. ICLR’16] Building class Fine-tuned CNN using a dataset with landmark classes [Babenko et al. ECCV’14] Landmark class NetVLAD: Weakly supervised spatially closest ≠ matching fine-tuned CNN using GPS tags [Arandjelovic et al. CVPR’16] Automatic annotations for CNN training [Radenovic et al. ECCV’16] Hard negatives Hard positives

Image Retrieval with CNN Giorgos Tolias Visual Recognition Group, - PowerPoint PPT Presentation

Visual Place Recognition as Image Retrieval with CNN Giorgos Tolias Visual Recognition Group, CTU in Prague CVPR 2017 tutorial on Large-Scale Visual Place Recognition and Image-Based Localization Alex Kendall, Torsten Sattler, Giorgos Tolias,

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

CS7015 (Deep Learning) : Lecture 12 Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only

Object Detection using R-CNN Experiments CS381V: Visual Recognition, Spring 2016 William Xie

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

YIN XU 1. Image Segmentaion & Retrieval What is image segmentation? Whats the

Visual Instance Retrieval Praveen Krishnan CVIT, IIIT Hyderabad June 15, 2017 1 Outline Image

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Content-Based Image Retrieval Queries Commercial Systems Retrieval Features

Decay vertex ID using CNN for p K+ Aaron Higuera University of Houston CNN Tools on

CNN Ba CNN Based ed Pi Pipeline peline for or Op Optical ical Fl Flow ow Tal Schuster,

CENG5030 Part 2-1: Introduction to Convolutional Nueral Network Bei Yu (Latest update: March 4,

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive

Object Detection Ujjwal Post-Doc, STARS Team INRIA Sophia Antipolis Outline What is Object

Convolutional Feature Maps Elements of efficient (and accurate) CNN-based object detection

Tw Two-sta stage ge object object detec detectors tors CV3DST | Prof. Leal-Taix 1 Ty

Today Problems with visualizing high dimensional data Problem Overview Visual cluttering

A Hierarchical Matching of Deformable Shapes Pedro Felzenszwalb Department of Computer Science

Learning Dense Correspondence via 3D-guided Cycle Consistency Tinghui Zhou 1 , Philipp

Image & Representation Luc Renambot renambot@uic.edu Book From Designing Visual

Image Retrieval with CNN Giorgos Tolias Visual Recognition Group, - PowerPoint PPT Presentation

Visual Place Recognition as Image Retrieval with CNN Giorgos Tolias Visual Recognition Group, CTU in Prague CVPR 2017 tutorial on Large-Scale Visual Place Recognition and Image-Based Localization Alex Kendall, Torsten Sattler, Giorgos Tolias,

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

CS7015 (Deep Learning) : Lecture 12 Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only

Object Detection using R-CNN Experiments CS381V: Visual Recognition, Spring 2016 William Xie

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

YIN XU 1. Image Segmentaion &amp; Retrieval What is image segmentation? Whats the

Visual Instance Retrieval Praveen Krishnan CVIT, IIIT Hyderabad June 15, 2017 1 Outline Image

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Retrieval Models: Outline CS490W: Web I nformation Search &amp; Management Retrieval Models

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Content-Based Image Retrieval Queries Commercial Systems Retrieval Features

Decay vertex ID using CNN for p K+ Aaron Higuera University of Houston CNN Tools on

CNN Ba CNN Based ed Pi Pipeline peline for or Op Optical ical Fl Flow ow Tal Schuster,

CENG5030 Part 2-1: Introduction to Convolutional Nueral Network Bei Yu (Latest update: March 4,

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive

Object Detection Ujjwal Post-Doc, STARS Team INRIA Sophia Antipolis Outline What is Object

Convolutional Feature Maps Elements of efficient (and accurate) CNN-based object detection

Tw Two-sta stage ge object object detec detectors tors CV3DST | Prof. Leal-Taix 1 Ty

Today Problems with visualizing high dimensional data Problem Overview Visual cluttering

A Hierarchical Matching of Deformable Shapes Pedro Felzenszwalb Department of Computer Science

Learning Dense Correspondence via 3D-guided Cycle Consistency Tinghui Zhou 1 , Philipp

Image &amp; Representation Luc Renambot renambot@uic.edu Book From Designing Visual

YIN XU 1. Image Segmentaion & Retrieval What is image segmentation? Whats the

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models

Image & Representation Luc Renambot renambot@uic.edu Book From Designing Visual