1
Return of the Devil in the Details: Delving Deep into Convolutional Nets
Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman
Visual Geometry Group, Department of Engineering Science, Univesity of Oxford
Hilal E. Akyüz
Return of the Devil in the Details: Delving Deep into Convolutional - - PowerPoint PPT Presentation
Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman Visual Geometry Group, Department of Engineering Science, Univesity of Oxford Hilal E. Akyz 1 2
1
Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman
Visual Geometry Group, Department of Engineering Science, Univesity of Oxford
Hilal E. Akyüz
2
slide by Chatfeld et al
3
slide by Chatfeld et al
4
5
and data augmentation techniques
architectures and different learning heuristics.
6
– Contains 1,000 object categories from
– ~1.2M training images – 50,000 validation images – 100,000 test images
7
– Multi-label dataset – Contains ~10,000 images – 20 objects classes – Images split into train,
validation and test sets.
– Multi-label dataset – Contains ~ twice as
many images
– Does not include test
set, instead, evaluation uses the
Evaluation Server.
8
– 101 classes – Three random split – 30 training, 30 testing
images per class.
– 256 classes – Two random split – 60 training, the rest are
used for testing
9
– Shallow represantation – Deep representation (CNN) with pre-training – Deep representation (CNN) with pre-training and
fine-tuning
– CNN-S, CNN-M, CNN-F
Generally-applicable best practices Scenario-specifc best practices
1
slide by Chatfeld et al
1 1
slide by Chatfeld et al
1 2
methods
1 3
– Intra-norm
– Spatially-extended local descriptors
– Color features
1 4
1 5
slide by Chatfeld et al
1 6
slide by Chatfeld et al
1 7
1 8
– Before introducing to SVM
1 9
– Momentum is 0.9 – Weight decay is 5x10-4 – Learning rate is 10-2, decreased by 10
– Random crops – Flips – RGB jitterring
2
2 1
slide by Chatfeld et al
2 2
2 3
slide by Chatfeld et al
2 4
slide by Chatfeld et al
2 5
slide by Chatfeld et al
2 6
slide by Chatfeld et al
2 7
slide by Chatfeld et al
2 8
slide by Chatfeld et al
2 9
VOC 2007 Results
slide by Chatfeld et al
3
slide by Chatfeld et al
3 1
slide by Chatfeld et al
3 2
shallow methods
loss can be prefferred
computation is slower
features
performant features with CNN-based methods
3 3
slide by Chatfeld et al
3 4
3 5
CNN-S 76.10 CNN-M 76.11 AlexNet 71.40 GoogleNet 80.91 ResNet 83.06 VGG19 81.01
3 6
CNN_M 169 CNN_S 151 ResNet 11 GoogleNet 71 VGG19 50
3 7
slide by Chatfeld et al
3 8
slide by Chatfeld et al
3 9
slide by Chatfeld et al