return of the devil in the details delving deep into
play

Return of the Devil in the Details: Delving Deep into Convolutional - PowerPoint PPT Presentation

Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman Visual Geometry Group, Department of Engineering Science, Univesity of Oxford Hilal E. Akyz 1 2


  1. Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman Visual Geometry Group, Department of Engineering Science, Univesity of Oxford Hilal E. Akyüz 1

  2. 2 slide by Chatfeld et al

  3. 3 slide by Chatfeld et al

  4. What is Changed Since 2011? ● Different deep architectures ● The latest generation of CNNs have achieved impressive results ● Unclear how the different methods introduced recently compare to each other and to shallow methods 4

  5. Overview of the Paper This paper compare the latest (till 2014) methods ● on a commond ground . Several properties of CNN-based representation ● and data augmentation techniques Compare both different pre-trained network ● architectures and different learning heuristics . 5

  6. Dataset (pre-training) ● ILSVRC-2012 – Contains 1,000 object categories from ImageNet – ~1.2M training images – 50,000 validation images – 100,000 test images ● Performance is evaluated using top-5 classification error 6

  7. Datasets (training, fine-tuning) ● Pascal VOC 2012 ● Pascal VOC 2007 – Multi-label dataset – Multi-label dataset – Contains ~ twice as – Contains ~10,000 images many images – 20 objects classes – Does not include test set, instead, evaluation uses the – Images split into train, official PASCAL validation and test sets. Evaluation Server. ● Performance is measured as mean Average Precision ( mAP ) 7

  8. Datasets (training, fine-tuning) ● Caltech-101 ● Caltech-256 – 101 classes – 256 classes – Three random split – Two random split – 30 training, 30 testing – 60 training, the rest are images per class . used for testing ● Performance is measured using mean class accuracy 8

  9. Outline ● 3 scenarios: – Shallow represantation – Deep representation (CNN) with pre-training – Deep representation (CNN) with pre-training and fine-tuning ● Different pre-trained networks – CNN-S, CNN-M, CNN-F Scenario-specifc Reducing CNN final layer output dimensionality ● best practices Data augmentation ( for both CNN and IFV ) ● Generally-applicable Color information best practices ● Feature normalisation (for both CNN and IFV) ● 9

  10. 1 0 Data Augmentation slide by Chatfeld et al

  11. 1 1 slide by Chatfeld et al

  12. Scenario1: Shallow Representation (IFV) IFV usually outperformed related encoding ● methods Power normalization for improved ● 1 2

  13. IFV Details Multi-scale dense sampling ● SIFT features ● Soft quantized using GMM with K=256 components ● Spatial Pyramid (1x1, 3x1, 2x2) ● 3 modification: ● – Intra-norm ● L2 norm is >applied to the subblocks – Spatially-extended local descriptors ● Memory-efficient than SPM – Color features ● Local Color Statistics 1 3

  14. Scenario2: Deep Representation (CNN) with Pre-training ● Pre-trained on ImageNet ● 3 different pre-trained networks 1 4

  15. 1 5 slide by Chatfeld et al

  16. 1 6 Pre-Trained Networks slide by Chatfeld et al

  17. Scenario3: Deep Representation (CNN) with Pre-training & Fine-tuning Pre-trained on one dataset and applied to another ● Improve the performance ● Become dataset-specific ● 1 7

  18. CNN Details ● Trained with same training protocol, same implementation ● Caffe framework ● L2 normalization of CNN features – Before introducing to SVM 1 8

  19. CNN Training ● Gradient descent with momentum – Momentum is 0.9 – Weight decay is 5x10 -4 – Learning rate is 10 -2 , decreased by 10 ● Data augmentation – Random crops – Flips – RGB jitterring ● 3 weeks with a Titan Black (Slow arch.) 1 9

  20. CNN Fine-tuning ● Only last layer ● Classification hinge loss (CNN-S TUNE-CLS), ranking hinge loss (CNN-S TUNE-RNK) for VOC ● Softmax regression loss for Caltech-101 ● Lower initial learning rate (VOC & Caltech) 2 0

  21. 2 1 slide by Chatfeld et al

  22. Analysis 2 2

  23. 2 3 slide by Chatfeld et al

  24. 2 4 slide by Chatfeld et al

  25. 2 5 slide by Chatfeld et al

  26. 2 6 slide by Chatfeld et al

  27. 2 7 slide by Chatfeld et al

  28. 2 8 slide by Chatfeld et al

  29. 2 9 VOC 2007 Results slide by Chatfeld et al

  30. 3 0 slide by Chatfeld et al

  31. 3 1 slide by Chatfeld et al

  32. Take Home Messages Data augmentation helps a lot, both for deep and ● shallow methods Fine-tuning makes a difference, and use of ranking ● loss can be prefferred Smaller filters and deeper networks help, although feature ● computation is slower CNN-based methods >> shallow methods ● We can transfer tricks from deep features to shallow ● features We can achieve incredibly low dimensional (~128D) but ● performant features with CNN-based methods ● If you get the details right, it's possible to get to state-of-the-art with very simple methods!! 3 2

  33. 3 3 slide by Chatfeld et al

  34. Thank You For Listening.. Q&A? (DEMO) Hilal E. Akyüz 3 4

  35. DEMO CNN Model Pascal VOC 2007 mAP CNN-S 76.10 CNN-M 76.11 AlexNet 71.40 GoogleNet 80.91 ResNet 83.06 VGG19 81.01 3 5

  36. Demo Model FPS (batch size=1) CNN_M 169 CNN_S 151 ResNet 11 GoogleNet 71 VGG19 50 3 6

  37. 3 7 Extras slide by Chatfeld et al

  38. 3 8 Extras slide by Chatfeld et al

  39. 3 9 Extras slide by Chatfeld et al

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend