SPP-net
Spatial Pyramid Pooling in Deep Convolutional Networks
Kaiming He
Xiangyu Zhang Shaoqing Ren Jian Sun Microsoft Research Asia Visual Computing Group
SPP-net Spatial Pyramid Pooling in Deep Convolutional Networks - - PowerPoint PPT Presentation
SPP-net Spatial Pyramid Pooling in Deep Convolutional Networks Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun Microsoft Research Asia Visual Computing Group Highlights ILSVRC 2014 (all provided-data tracks) DET - 2 nd CLS - 3 rd
Kaiming He
Xiangyu Zhang Shaoqing Ren Jian Sun Microsoft Research Asia Visual Computing Group
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”
[Grauman & Darrell, ICCV 2005] “The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features” [Lazebnik et al, CVPR 2006] “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories”
dense SIFT encoded (VQ, SC, FV) SPM SVM
prediction
“fc layers” simply pooling? “conv layers” CNN counterparts
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”
4096 1000 4096
traditional CNN
fixed size conv fc
SPP-net
any size
4096 1000 4096
spatial pyramid pooling
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”
conv layers conv feature maps concatenate input image
…... …...
spatial pyramid pooling layer fc layers
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”
14.76 13.92 13.52 11.97 14.14 13.54 12.80 11.12 13.64 13.33 12.33 10.95
10.00 10.50 11.00 11.50 12.00 12.50 13.00 13.50 14.00 14.50 15.00
ZF-5 Convnet*-5 Overfeat-5 Overfeat-7
ILSVRC top-5 val (10-view)
no-SPP baselines + multi-size training multi-level pooling
All CNNs improved!
4 architectures
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”
multiple SPP-nets 8.06% 7-conv SPP-net, 10-view 10.95% 7-conv SPP-net, multi-scale/view 9.08%
team top-5 test GoogLeNet 6.66 Oxford VGG 7.32
8.06 Howard 8.11 DeeperVision 9.50 NUS-BST 9.79 TTIC_ECP 10.22 …
7-conv SPP-net, 96-view+2-full 9.08%
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”
SPP
conv feature maps conv layers input image region …... fc layers
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”
SPP-net 1 net on full image
image net feature feature feature net feature image net feature net feature net feature
R-CNN 2000 nets on image regions
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”
SPP-net
1-scale
SPP-net
5-scale
RCNN mAP 58.0 59.2 58.5 GPU time / img 0.14s 0.38s 9s speed-up 64x 24x
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”
SPP-net RCNN GPU time / img 0.6s 32s 40k test imgs 8 hours 15 days cost of a single model
“provided data” track mAP NUS 37.2
35.1 UvA 32.0
31.8 Southeast-CASIA 30.4 1-HKUST 28.8 CASIA_CRIPAC_2 28.6
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”
http://research.microsoft.com/en-us/um/people/kahe/