SPP-net Spatial Pyramid Pooling in Deep Convolutional Networks - - PowerPoint PPT Presentation

spp net
SMART_READER_LITE
LIVE PREVIEW

SPP-net Spatial Pyramid Pooling in Deep Convolutional Networks - - PowerPoint PPT Presentation

SPP-net Spatial Pyramid Pooling in Deep Convolutional Networks Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun Microsoft Research Asia Visual Computing Group Highlights ILSVRC 2014 (all provided-data tracks) DET - 2 nd CLS - 3 rd


slide-1
SLIDE 1

SPP-net

Spatial Pyramid Pooling in Deep Convolutional Networks

Kaiming He

Xiangyu Zhang Shaoqing Ren Jian Sun Microsoft Research Asia Visual Computing Group

slide-2
SLIDE 2

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”

  • K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

Highlights

  • ILSVRC 2014 (all provided-data tracks)
  • DET - 2nd
  • CLS - 3rd
  • LOC - 5th
  • ECCV 2014 paper
  • Published 2 months ago (arXiv:1406.4729v1, June 18)
  • Details disclosed (arXiv:1406.4729v2)
slide-3
SLIDE 3

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”

  • K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

Overview

  • SPP-net
  • a new network structure
  • Classification
  • improves all CNNs
  • Detection
  • 20-60x faster than R-CNN, as accurate
slide-4
SLIDE 4

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”

  • K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

Spatial Pyramid Matching

  • SPM: very successful in traditional computer vision

[Grauman & Darrell, ICCV 2005] “The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features” [Lazebnik et al, CVPR 2006] “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories”

dense SIFT encoded (VQ, SC, FV) SPM SVM

prediction

“fc layers” simply pooling? “conv layers” CNN counterparts

slide-5
SLIDE 5

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”

  • K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

SPP-net: SPM in CNN

4096 1000 4096

traditional CNN

fixed size conv fc

SPP-net

any size

4096 1000 4096

spatial pyramid pooling

  • Fix bin numbers
  • DO NOT fix bin size
slide-6
SLIDE 6

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”

  • K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

SPP-net

  • variable input size/scale
  • multi-size training
  • multi-scale testing
  • full-image view
  • multi-level pooling
  • robust to deformation
  • operates on feature maps
  • pooling in regions

conv layers conv feature maps concatenate input image

…... …...

spatial pyramid pooling layer fc layers

slide-7
SLIDE 7

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”

  • K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

14.76 13.92 13.52 11.97 14.14 13.54 12.80 11.12 13.64 13.33 12.33 10.95

10.00 10.50 11.00 11.50 12.00 12.50 13.00 13.50 14.00 14.50 15.00

ZF-5 Convnet*-5 Overfeat-5 Overfeat-7

ILSVRC top-5 val (10-view)

no-SPP baselines + multi-size training multi-level pooling

All CNNs improved!

4 architectures

slide-8
SLIDE 8

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”

  • K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

ILSVRC 2014 CLS Results

multiple SPP-nets 8.06% 7-conv SPP-net, 10-view 10.95% 7-conv SPP-net, multi-scale/view 9.08%

  • “shallow”
  • 7-conv, 1 Titan GPU, 3 weeks
  • but potential
  • SPP can improve deeper nets: >1% gain post-competition

team top-5 test GoogLeNet 6.66 Oxford VGG 7.32

  • urs

8.06 Howard 8.11 DeeperVision 9.50 NUS-BST 9.79 TTIC_ECP 10.22 …

7-conv SPP-net, 96-view+2-full 9.08%

slide-9
SLIDE 9

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”

  • K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

Detection: SPP on Regions

SPP

conv feature maps conv layers input image region …... fc layers

slide-10
SLIDE 10

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”

  • K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

RCNN vs. SPP

  • image regions vs. feature map regions

SPP-net 1 net on full image

image net feature feature feature net feature image net feature net feature net feature

R-CNN 2000 nets on image regions

slide-11
SLIDE 11

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”

  • K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
  • With regional features, we can do everything of RCNN
  • fine-tune, SVM, bbox regression…
  • similar accuracy, much faster

SPP-net

1-scale

SPP-net

5-scale

RCNN mAP 58.0 59.2 58.5 GPU time / img 0.14s 0.38s 9s speed-up 64x 24x

  • VOC 2007
slide-12
SLIDE 12

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”

  • K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

SPP-net RCNN GPU time / img 0.6s 32s 40k test imgs 8 hours 15 days cost of a single model

ILSVRC 2014 DET Results

“provided data” track mAP NUS 37.2

  • urs, multi SPP-nets

35.1 UvA 32.0

  • urs, 1 SPP-net

31.8 Southeast-CASIA 30.4 1-HKUST 28.8 CASIA_CRIPAC_2 28.6

slide-13
SLIDE 13

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”

  • K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
  • Conclusion
  • SPM in CNNs
  • CLS: improve all CNNs in the literature
  • DET: practical, fast, and accurate
  • Future work
  • SPP on advanced networks
  • Resources
  • code, config, tech report…

http://research.microsoft.com/en-us/um/people/kahe/

  • Acknowledgement
  • We thank NVIDIA for the GPU donation.