Return of the Devil in the Details: Delving Deep into Convolutional - PowerPoint PPT Presentation

Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield - Karen Simonyan - Andrea Vedaldi - Andrew Zisserman University of Oxford

The Devil is still in the Details 2011 2014

Comparing Apples to Apples: State-of-the-art back in 2011 Back in 2011, state-of-the-art image classification pipelines were commonly based on the bag of visual words approach, with highly tuned feature encoders � � LLC SV IFV � Improved Locality Constrained Super-Vector Fisher Vector Linear Coding Encoding � � There were many feature encodings for this being proposed, but it was difficult to tell which worked best 3

Comparing Apples to Apples: State-of-the-art back in 2011 In our previous work (BMVC 2011) we conducted an extensive evaluation of these encodings comparing them all on a common-ground: � IFV Fixed Fixed Input Fixed � Feature LLC Evaluation Dataset Learning Extractor Protocol � SV � * we’ll call the features from these encodings shallow to distinguish them from the CNN-based features which follow 4

What’s Changed? State-of-the-art in 2014 • Introduction of CNN-based deep visual features to the community, all using pre-trained networks (Krizhevsky et al. 2012, Donahue et al. 2013, Oquab et al. 2014, Sermanet et al. 2014) • Have shown to perform excellently over standard classification and detection benchmarks • Unclear how the different methods introduced recently compare to each other, and to shallow methods such as IFV 5

Comparing Apples to Apples: State-of-the-art in 2014 • This work is again about comparing the latest methods on a common ground • We compare both different pre-trained network architectures and different learning heuristics CNN Arch 1 Fixed Fixed CNN Input Evaluation Learning Arch 2 Dataset Protocol … IFV 6

Performance Evolution over VOC2007 2008 2010 ... 2013 2014 82.42 82 80.13 80 78 77.15 76 74 73.41 72 • Our best CNN method 70 mAP achieves state-of-the-art 68.02 68 performance over several 66 datasets 64.36 64 62 61.69 • How do we get there?   60 through comparison on equal 58 footing, we determine what’s 56 54.48 important and what’s not 54 Method BOW IFV-BL IFV IFV DeCAF CNN-F CNN-M 2K CNN-S Dim. 32K 327K 84K 84K 4K 4K 2K 4K (TN) Aug. – – – f s t t f s f s f s CNN-based methods 7

1 2 Augmentation 3 4 5 Outline Study Introduction and Evaluation Setup Different pre-trained networks 1 Data augmentation (for both CNN and IFV) 2 Dataset fine-tuning 3 Reducing CNN final layer output dimensionality 4 5 Colour and CNN / IFV 8

Evaluation Setup Pre-trained Net on 1,000 ImageNet Classes training set classifier output train CNN Feature Extractor SVM Classifier (4096-D feature vector out) test test set Evaluate using mAP , accuracy etc. 9

1 Nets 2 3 4 5 Pre-trained Networks • CNN-F similar to Krizhevsky et al., NIPS 2012: ‘ImageNet classification with deep convolutional networks’ conv4 � conv5 � fc7 � conv1 � conv2 � conv3 � 4096 d.o. fc6 � 256x3x3 256x3x3 � 64x11x11 256x5x5 256x3x3 4096 stride 4 stride 1 stride 1 drop-out • CNN-M similar to Zeiler and Fergus, CoRR 2013: ‘Visualising and understanding convolutional networks’ conv4 � conv5 � fc7 � conv1 � conv2 � conv3 � 4096 d.o. fc6 � � 512x3x3 512x3x3 96x7x7 256x5x5 512x3x3 4096 stride 2 stride 2 stride 1 drop-out • CNN-S similar to OverFeat ‘accurate’ network, ICLR 2014: ‘OverFeat: integrated recognition, localisation and detection using ConvNets' conv4 � conv5 � fc7 � conv1 � conv2 � conv3 � 4096 d.o. fc6 � 512x3x3 512x3x3 96x7x7 512x3x3 4096 256x5x5 stride 2 stride 1 drop-out stride 1 10

1 Nets 2 3 4 5 Pre-trained Networks 79.89 79.74 80 78.25 mAP ( VOC07 ) 77.38 76.5 74.75 73.41 73 Decaf CNN-F CNN-M CNN-S 11

1 2 Augmentation 3 4 5 Outline Study Introduction and Evaluation Setup Different pre-trained networks 1 Data augmentation (for both CNN and IFV) 2 Dataset fine-tuning 3 Reducing CNN final layer output dimensionality 4 5 Colour and CNN / IFV 12

1 2 Augmentation 3 4 5 Data Augmentation What do we mean by data augmentation? Network Pre-training Pre-trained Network (with jittering) CNN Feature Extractor a. Extract crops b. Pool features (average, max) 13

1 2 Augmentation 3 4 5 Data Augmentation a. No augmentation (= 1 image) 224x224 b. Flip augmentation (= 2 images) 224x224 + c. Crop+Flip augmentation (= 10 images) 224x224 + flips 14

1 2 Augmentation 3 4 5 Data Augmentation None Flip Crop+Flip (train pooling: sum, test pooling: sum) Crop+Flip (train pooling: none, test pooling: sum) 79.89 79.44 80 76.97 76.99 75 mAP ( VOC07 ) 70 67.17 66.68 64.36 64.35 65 60 IFV CNN-M 15

1 2 3 Fine-tuning 4 5 Outline Study Introduction and Evaluation Setup Different pre-trained networks 1 Data augmentation (for both CNN and IFV) 2 Dataset fine-tuning 3 Reducing CNN final layer output dimensionality 4 5 Colour and CNN / IFV 16

1 2 3 Fine-tuning 4 5 Fine-tuning Network Pre-training General-purpose Pre-trained Network images from ILSVRC-2012 Features Network Fine-tuning Dataset-specific Fine-tuned Network images from target dataset Features For VOC 2007, the following loss functions were evaluated for the final fully connected layer: • TN-CLS – classification loss max{ 0, 1 - y w T φ ( I ) } • TN-RNK – ranking loss max{ 0, 1 - w T ( φ ( I POS ) - φ ( I NEG ) ) } 17

1 2 3 Fine-tuning 4 5 Fine-tuning 83 82.4 82 mAP ( VOC07 ) 81 80 79.7 79 No TN TN-RNK CNN-S 18

1 2 3 4 Output Dim 5 Outline Study Introduction and Evaluation Setup Different pre-trained networks 1 Data augmentation (for both CNN and IFV) 2 Dataset fine-tuning 3 Reducing CNN final layer output dimensionality 4 5 Colour and CNN / IFV 19

1 2 3 4 Output Dim 5 Low Dimensional CNN Features • Baseline networks all have 4096-D last hidden layer • We further trained three modifications to CNN-M with lower dimensional full7 layers conv4 � conv5 � conv1 � conv3 � fc7 � conv2 � 4096 d.o. fc6 � 512x3x3 512x3x3 96x7x7 256x5x5 512x3x3 4096 st. 2 st. 2, pad 1 st. 1, pad 1 drop-out 2048 * Note: as only the original ILSVRC-2012 data was used for re-training this differs from fine-tuning 1024 and is simply a way of reducing the final output dimension 128 20

1 2 3 4 Output Dim 5 Low Dimensional CNN Features 81 80.1 80.25 79.91 mAP ( VOC07 ) 79.89 79.5 78.6 78.75 78 4096 2048 1024 128 CNN-M 21

1 2 3 4 5 IFV Exts. Outline Study Introduction and Evaluation Setup Different pre-trained networks 1 Data augmentation (for both CNN and IFV) 2 Dataset fine-tuning 3 Reducing CNN final layer output dimensionality 4 5 Colour and CNN / IFV 22

1 2 3 4 5 IFV Exts. Impact of Colour Greyscale Colour Greyscale+aug Colour+aug 79.89 80 76.97 77 75 73.59 mAP ( VOC07 ) 70 68.02 67.93 66.37 65.36 65 60 IFV-512 CNN-M 23

Comparison to State-of-the-art ILSVRC-2012 VOC2007 VOC2012 CNN-M 2048 13.5 80.1 82.4 CNN-S 13.1 79.7 82.9 13.1 CNN-S TUNE-RNK 82.4 83.2 16.1 79.0 Zeiler & Fergus Oquab et al. 18.0 77.7 78.7 ( 82.8 *) Oquab et al. 86.3 * Wei et al. 81.5 ( 85.2 * ) 81.7 ( 90.3 * ) * Uses extended training data and/or fusion with other methods 24

Take Home Messages • CNN-based methods >> shallow methods • We can transfer tricks from deep features to shallow features • We can achieve incredibly low dimensional (~128-D) but performant features with CNN-based methods • If you get the details right, it’s possible to get to state- of-the-art with very simple methods 25

There’s more… • Presented here was just a subset of the full results from the paper • Check out the paper for full results on: • VOC 2007 • VOC 2012 • Caltech-101 • Caltech-256 • ILSVRC-2012 26

One more thing… • CNN models and feature computation code can now be downloaded from the project website:   http://www.robots.ox.ac.uk/~vgg/software/deep_eval/ • As before, source code to reproduce all experiments will be made available 27

Questions?

Return of the Devil in the Details: Delving Deep into Convolutional - PowerPoint PPT Presentation

Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield - Karen Simonyan - Andrea Vedaldi - Andrew Zisserman University of Oxford The Devil is still in the Details 2011 2014 Comparing Apples to Apples:

Devil Figures What is the devil figure archetype in a story? Definition: The devil figure

Better the Devil You Know Who or what is the devil? The name commonly given to the fallen

Red Devil Mine Red Devil Mine Mike McCrum, BLM Alaska Doug Cox, Ph.D., BLM Natl Operations

Red Devil Mine Risk Assessment for Mercury Releases to the Kuskokwim River from the BLM Red Devil

Dashboard Block Block details by height Block details by ID Transaction details by receipient

STAY HEALTHY | RETURN SMARTER | RETURN STRONGER THANK YOU STAY HEALTHY | RETURN SMARTER | RETURN

Iteration Announcements Return Return Statements 4 Return Statements A return statement

Red Devil Mine Human Health Risk Assessment for Mercury Releases to the Kuskokwim River from the

Whoever makes a practice of sinning is of the devil, for the devil has been sinning from the

Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield, Karen

In Bed With The Devil: Recognizing Human Teratogenic Exposures Jan M. Friedman, MD, PhD Jan M.

Angel, Devil, and King Martin Kutz Max-Planck Institut fr Informatik, Saarbrcken, Germany

Working Definitions Alpha : investment return less index return; excess return, value

Returns Optimization 101 Episode 7: Examining Return Reasons Power of Return Reasons Return

return password return hash( password ) return hash( password, salt )

Iteration Announcements Return Return Statements A return statement completes the evaluation of

Math 221: LINEAR ALGEBRA 6-4. Vector Spaces - Finite Dimensional Spaces Le Chen 1 Emory

Routine Multi-Pesticide Residue Analysis by Orbitrap MS Technology Osama Abu-Nimreh CMD Sales

What Works Best with TSPi for Small Team Productivity and Quality William L. Honig, Ph.D.

What Does the Mutual Market Era Mean for Fixed Income and Currency? James Fok, Head of Group

Precision measurement of Triple Gauge Couplings at future e + e colliders Linear Collider

The Extragalactic Radio Background Challenges and Opportunities Al Kogut Goddard Space Flight

Retail Commercial Leasing Nuts and Bolts: Protect Against Use and Exclusive Conflicts by David S.

TENLAB: When Matrices Are Not Enough Mehmet Turkcan, Dallas R. Jones, Cem Subakan May 11, 2016