We Dont Need No Annotation (Efficient Training for Image Retrieval) - PowerPoint PPT Presentation

We Don’t Need No Annotation (Efficient Training for Image Retrieval) Ondra Chum Visual Recognition Group Department of Cybernetics, Faculty of Electrical Engineering CTU in Prague

Outline Algorithmic supervision for CNN training (local features based methods) • CNN fine-tuning for efficient image retrieval • Sketch based image retrieval with CNN descriptors Unsupervised metric learning from data manifolds 2 / 55

CNN fine-tuning for image retrieval Filip Radenović Giorgos Tolias F. Radenovic, G. Tolias and O. Chum, CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples, In ECCV 2016

Image Retrieval Challenges Significant viewpoint and/or scale change Significant illumination change Severe occlusions Visually similar but different objects Old school: local features, photometric normalization, geometric constraints CNNs: lots of training data, provides image embedding, nearest neighbor search 4 / 55

Lots of Training Examples Training … Image annotations Large Internet Convolutional Neural photo collection Network (CNN) 5 / 55

Lots of Training Examples Manual cleaning of the training data done by Researchers Very expensive $$$$ … Not accurate Large Internet Convolutional Neural Not free $ photo collection Network (CNN) Automated extraction of training data Accurate Free $ 6 / 55

CNN Image Retrieval • Image representation created from CNN activations of a network pre-trained for classification task [Gong et al. ECCV’14, Razavian et al. arXiv’14, Babenko et al. ICCV’15, Kalantidis et al. arXiv’15, Tolias et al. ICLR’16] Images from ImageNet.org + Retrieval accuracy suggests generalization of CNNs - Trained for image classification, NOT retrieval task 7 / 55

CNN Image Retrieval • Image representation created from CNN activations of a network pre-trained for classification task [Gong et al. ECCV’14, Razavian et al. arXiv’14, Babenko et al. ICCV’15, Kalantidis et al. arXiv’15, Tolias et al. ICLR’16] Same Class + Retrieval accuracy suggests generalization of CNNs - Trained for image classification, NOT retrieval task 8 / 55

CNN Image Retrieval • CNN network re-trained using a dataset that contains landmarks and buildings as object classes. [Babenko et al. ECCV’14] + Training dataset closer to the target task - Final metric different to the one actually optimized - Constructing training datasets requires manual effort 9 / 55

CNN Image Retrieval • CNN network re-trained using a dataset that contains landmarks and buildings as object classes. [Babenko et al. ECCV’14] Same Class + Training dataset closer to the target task - Final metric different to the one actually optimized - Constructing training datasets requires manual effort Image from [Babenko et al. ECCV’14] 10 / 55

CNN Image Retrieval • NetVLAD: end-to-end fine-tuning for image retrieval. Geo-tagged dataset for weakly supervised fine-tuning. [Arandjelovic et al. CVPR’16] + Training dataset corresponds to the target task + Final metric corresponds to the one actually optimized - Training dataset requires geo-tags 11 / 55

CNN Image Retrieval • NetVLAD: end-to-end fine-tuning for image retrieval. Geo-tagged dataset for weakly supervised fine-tuning. [Arandjelovic et al. CVPR’16] unknown query + Training dataset corresponds to the target task + Final metric corresponds to the one actually optimized - Training dataset requires geo-tags Camera Orientation Unknown 12 / 55

CNN learns from BoW – Training Data Camera Orientation Known Input: Large unannotated dataset Number of Inliers Known 1. Initial clusters created by grouping of spatially related images [Chum & Matas PAMI’10] 2. Clustered images used as queries for a retrieval-SfM pipeline [Schonberger et al. CVPR’15] Output: Non-overlapping 3D models 551 (134k) training / 162 (30k) validation 13 / 55

Hard Negative Examples Negative examples: images from different 3D models than the anchor Hard negatives: closest negative examples to the anchor Only hard negatives: as good as using all negatives, but faster increasing CNN descriptor distance to the anchor naive hard negatives the most similar diverse hard negatives anchor CNN descriptor top k by CNN top k: one per 3D model F. Radenovic, G. Tolias and O. Chum, CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples, In ECCV 2016 14 / 55

Hard Positive Examples Positive examples: images that share 3D points with the anchor Hard positives: positive examples not close enough to the anchor random from anchor top 1 by CNN top 1 by BoW top k by BoW harder positives used in NetVLAD F. Radenovic, G. Tolias and O. Chum, CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples, In ECCV 2016 15 / 55

CNN Siamese Learning Query Convolutional Layers Pooling Descriptor D x 1 MAC & … CNN L2-norm desc. Pair Label Contrastive 1 – positive MATCHING PAIR Loss 0 – negative D x 1 MAC & … CNN L2-norm desc. Positive Convolutional Layers Pooling Descriptor F. Radenovic, G. Tolias and O. Chum, CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples, In ECCV 2016 16 / 55

CNN Siamese Learning Query Convolutional Layers Pooling Descriptor D x 1 MAC & … CNN L2-norm desc. Pair Label Contrastiv 1 – positive NON-MATCHING PAIR e 0 – negative Loss D x 1 MAC & … CNN L2-norm desc. Convolutional Layers Pooling Descriptor F. Radenovic, G. Tolias and O. Chum, CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples, In ECCV 2016 17 / 55

Component Contributions (AlexNet) end-to-end learning post-processing global max Dx1 optional … pooling & CNN whitening dim L2-norm desc. reduction Careful choice of positive and negative training images makes a difference 68.9 67.5 67.1 MAC: learned whitening 63.9 63.1 62.2 MAC: random(top k BoW) + top 1 / model CNN 60.2 59.7 MAC: top 1 BoW + top 1 / model CNN 56.7 56.2 MAC: top 1 CNN + top 1 / model CNN 51.6 MAC: top 1 CNN + top k CNN 44.2 MAC: off-the-shelf Oxford 5k Paris 6k 18 / 55

Global Pooling end-to-end learning post-processing global Dx1 optional … pooling & CNN whitening dim L2-norm desc. reduction MAC max pooling M aximum A ctivations of C onvolutions [Tolias et al. ICLR’16] SPoC sum pooling S um- Po oled C onvolutional [Babenko et al. ICCV’15] GeM generalized mean pooling Ge neralized M ean p = 1 p = inf average pooling max pooling [Radenovic, Tolias, Chum: TPAMI 2018] 19 / 55

Component Contributions (AlexNet) Careful choice of positive and negative training images makes a difference 75.5 GeM: learned whitening 68.9 GeM: random(top k BoW) + top 1 / model CNN 67.7 68.6 67.5 67.1 MAC: learned whitening 63.9 63.1 62.2 MAC: random(top k BoW) + top 1 / model CNN 60.2 60.1 59.7 MAC: top 1 BoW + top 1 / model CNN 56.7 56.2 MAC: top 1 CNN + top 1 / model CNN 51.6 MAC: top 1 CNN + top k CNN 44.2 MAC: off-the-shelf Oxford 5k Paris 6k 20 / 55

Teacher vs. Student (VGG) Method Oxf5k Oxf105k Par6k Par106k 84.9 79.5 82.4 77.3 BoW(16M)+R+QE 82.4 79.7 73.9 74.6 CNN-MAC(512D) 21 / 55

Teacher vs. Student (VGG) Method Oxf5k Oxf105k Par6k Par106k 84.9 79.5 82.4 77.3 BoW(16M)+R+QE 82.4 79.7 73.9 74.6 CNN-MAC(512D) 86.4 81.3 88.1 81.7 CNN-GeM(512D) 90.7 88.6 92.2 88.0 CNN-GeM(512D)+QE Our CNN with GeM layer surpasses its teacher on all datasets!!! BUT… 22 / 55

Teacher vs. Student for small objects query region CNN query region BoW+geometry 23 / 55

CNN fine-tuning for sketch-based image retrieval Filip Radenović Giorgos Tolias

Sketch-based Image Retrieval 25 / 55

Sketch-based Image Retrieval 26 / 55

Training Data 27 / 55

Matching Sketches to Images Classical Approach Modern Approach Ours shape matching end-to-end deep learning deep shape matching (relatively cheap) training data training data training data (very expensive) image image sketch edge map edge map sketch training data … alignment no training + category + similarity - man-years of annotation shape information only - very difficult to train simple cost & training 28 / 55

Category Retrieval pig Result Query Shape based retrieval cannot do that  29 / 55

Category Retrieval Result Standard image search can do that for years already 30 / 55

Edge-maps vs Sketches 31 / 55

Training without a Single Sketch CNN Siamese learning contrastive loss 32 / 55

EdgeMAC Architecture end-to-end learning post-processing edge detector global max Dx1 optional … edge pooling & CNN whitening dim filtering L2-norm desc. reduction [Dollár & Zitnick ICCV’13] VGG 1 st layer RGB averaged to intensity edges filtered edge filtering layer 33 / 55

Results on Flickr 15k [21] Hu & Collomosse: A performance evaluation of gradient field hog descriptor for sketch based image retrieval. CVIU’13 Radenovic, Tolias, Chum: Generic Sketch-Based Retrieval Learned without Drawing a Single Sketch , arXiv 34 / 55 2017

Results on Shoes, Chairs and Handbags Fine-grained recognition of shoes / chairs [53] Q. Yu et al.: Sketch me that shoe . CVPR’16. Image from https://www.eecs.qmul.ac.uk/~qian/Project_cvpr16.html 35 / 55

We Dont Need No Annotation (Efficient Training for Image Retrieval) - PowerPoint PPT Presentation

We Dont Need No Annotation (Efficient Training for Image Retrieval) Ondra Chum Visual Recognition Group Department of Cybernetics, Faculty of Electrical Engineering CTU in Prague Outline Algorithmic supervision for CNN training (local

Annotation Processing in a Kotlin World Zac Sweers @pandanomic Annotation Processing in a

Annotation and Evaluation Diana Maynard, Niraj Aswani University of Sheffield University of

Lecture 2 Annotation tools & Segmentation Summary of Part 1 Annotation theory

Systematic Annotation Mark Voorhies 4/5/2012 Mark Voorhies Systematic Annotation Review RTFM

Assessing annotation Assessing annotation consistency in the Gene consistency in the Gene

Introduction Detecting Errors in Effects of Annotation Errors Detecting Errors in Corpus

Web Annotations Building the Experience Annotation An annotation is something added. It is not

They Don t Want Them Or You t Want Them Or You They Don Don t Have Them: t Have

Don Juans Troubles Don Juans Troubles Hey, Anna, how are you? Don Juans Troubles Hey,

Project Simple Annotation Pipeline - Ranjit Kumaresan Simple Annotation Pipeline Run a gene

Characterization and re- -annotation annotation Characterization and re of common genes found

Resources for Computational Linguistics Annotation Tools: RSTTool &MMAX Presentation by

Bacterial Genome Annotation Lucile Soler Annotation course 9 th -11 th may 2017 Bacterial genome

Image organization, annotation, Image organization, annotation, and retrieval from a human- -

Annotation Graphs, Annotation Servers and Multi-Modal Resources Infrastructure for

Annotation in a Publishing Context (Or Thinking Beyond the Annotated Bibliography) James

Formal Concept Analysis I Contexts, Concepts, and Concept Lattices Sebastian Rudolph

Power Grid Analysis Challenges for Large Microprocessor Designs Alexander Korobkov Contents

Continuous solutions to a balance law L. Caravenna, OxPDE F. Bigolin, F. Serra Cassano

ALGANT Michele Serra and Rosa Winter An introduction to the program June 26th, 2018 The ALGANT

Parton Shower Monte Carlo Event Generators Mike Seymour University of Manchester & CERN

On the Capacity of Information Networks January 28, 2005 April Rasala Lehman Joint work with

15-292 History of Computing Mini-computers, workstations and advances in portable memory 1 The

Square Kilometre Array Instrument and Science Overview Robert Braun SKA Science Director 22

We Dont Need No Annotation (Efficient Training for Image Retrieval) - PowerPoint PPT Presentation

We Dont Need No Annotation (Efficient Training for Image Retrieval) Ondra Chum Visual Recognition Group Department of Cybernetics, Faculty of Electrical Engineering CTU in Prague Outline Algorithmic supervision for CNN training (local

Annotation Processing in a Kotlin World Zac Sweers @pandanomic Annotation Processing in a

Annotation and Evaluation Diana Maynard, Niraj Aswani University of Sheffield University of

Lecture 2 Annotation tools &amp; Segmentation Summary of Part 1 Annotation theory

Systematic Annotation Mark Voorhies 4/5/2012 Mark Voorhies Systematic Annotation Review RTFM

Assessing annotation Assessing annotation consistency in the Gene consistency in the Gene

Introduction Detecting Errors in Effects of Annotation Errors Detecting Errors in Corpus

Web Annotations Building the Experience Annotation An annotation is something added. It is not

They Don t Want Them Or You t Want Them Or You They Don Don t Have Them: t Have

Don Juans Troubles Don Juans Troubles Hey, Anna, how are you? Don Juans Troubles Hey,

Project Simple Annotation Pipeline - Ranjit Kumaresan Simple Annotation Pipeline Run a gene

Characterization and re- -annotation annotation Characterization and re of common genes found

Resources for Computational Linguistics Annotation Tools: RSTTool &amp;MMAX Presentation by

Bacterial Genome Annotation Lucile Soler Annotation course 9 th -11 th may 2017 Bacterial genome

Image organization, annotation, Image organization, annotation, and retrieval from a human- -

Annotation Graphs, Annotation Servers and Multi-Modal Resources Infrastructure for

Annotation in a Publishing Context (Or Thinking Beyond the Annotated Bibliography) James

Formal Concept Analysis I Contexts, Concepts, and Concept Lattices Sebastian Rudolph

Power Grid Analysis Challenges for Large Microprocessor Designs Alexander Korobkov Contents

Continuous solutions to a balance law L. Caravenna, OxPDE F. Bigolin, F. Serra Cassano

ALGANT Michele Serra and Rosa Winter An introduction to the program June 26th, 2018 The ALGANT

Parton Shower Monte Carlo Event Generators Mike Seymour University of Manchester &amp; CERN

On the Capacity of Information Networks January 28, 2005 April Rasala Lehman Joint work with

15-292 History of Computing Mini-computers, workstations and advances in portable memory 1 The

Square Kilometre Array Instrument and Science Overview Robert Braun SKA Science Director 22

Lecture 2 Annotation tools & Segmentation Summary of Part 1 Annotation theory

Resources for Computational Linguistics Annotation Tools: RSTTool &MMAX Presentation by

Parton Shower Monte Carlo Event Generators Mike Seymour University of Manchester & CERN