spp net
play

SPP-net Spatial Pyramid Pooling in Deep Convolutional Networks - PowerPoint PPT Presentation

SPP-net Spatial Pyramid Pooling in Deep Convolutional Networks Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun Microsoft Research Asia Visual Computing Group Highlights ILSVRC 2014 (all provided-data tracks) DET - 2 nd CLS - 3 rd


  1. SPP-net Spatial Pyramid Pooling in Deep Convolutional Networks Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun Microsoft Research Asia Visual Computing Group

  2. Highlights • ILSVRC 2014 (all provided-data tracks) • DET - 2 nd • CLS - 3 rd • LOC - 5 th • ECCV 2014 paper • Published 2 months ago (arXiv: 1406.4729v1 , June 18) • Details disclosed (arXiv: 1406.4729v2 ) “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

  3. Overview • SPP-net - a new network structure • Classification - improves all CNNs • Detection - 20-60x faster than R-CNN, as accurate “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

  4. Spatial Pyramid Matching • SPM: very successful in traditional computer vision [Grauman & Darrell, ICCV 2005] “ The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features ” [Lazebnik et al , CVPR 2006 ] “ Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories ” prediction dense SIFT encoded SPM SVM (VQ, SC, FV) CNN “conv layers” simply pooling? “fc layers” counterparts “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

  5. SPP-net: SPM in CNN traditional 1000 CNN 4096 4096 fixed size conv fc SPP-net 1000 4096 4096 spatial pyramid any size pooling • Fix bin numbers • DO NOT fix bin size “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

  6. SPP-net fc layers • variable input size/scale • multi-size training concatenate • multi-scale testing … ... • full-image view … ... • multi-level pooling • robust to deformation • operates on feature maps • pooling in regions spatial pyramid pooling layer conv feature maps conv layers input image “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

  7. ILSVRC top-5 val (10-view) no-SPP baselines 15.00 14.76 14.50 13.92 14.14 14.00 13.54 13.52 multi-level pooling 13.50 13.64 13.33 13.00 12.80 12.50 12.33 12.00 11.97 + multi-size training 11.50 All CNNs 11.12 11.00 improved! 10.95 10.50 10.00 ZF-5 Convnet*-5 Overfeat-5 Overfeat-7 4 architectures “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

  8. ILSVRC 2014 CLS Results team top-5 test GoogLeNet 6.66 7-conv SPP-net, 10-view 10.95% Oxford VGG 7.32 ours 8.06 7-conv SPP-net, 96-view+2-full 9.08% 7-conv SPP-net, multi-scale/view 9.08% Howard 8.11 DeeperVision 9.50 multiple SPP-nets 8.06% NUS-BST 9.79 TTIC_ECP 10.22 … • “shallow” • 7-conv, 1 Titan GPU, 3 weeks • but potential • SPP can improve deeper nets: >1% gain post-competition “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

  9. Detection: SPP on Regions fc layers … ... SPP conv feature maps region conv layers input image “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

  10. RCNN vs. SPP • image regions vs . feature map regions feature feature feature feature feature feature feature net net net net net image image R-CNN SPP-net 2000 nets on image regions 1 net on full image “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

  11. • With regional features, we can do everything of RCNN • fine- tune, SVM, bbox regression… • similar accuracy, much faster SPP-net SPP-net RCNN 1-scale 5-scale mAP 58.0 59.2 58.5 GPU time / img 0.14s 0.38s 9s speed-up 64x 24x - VOC 2007 “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

  12. ILSVRC 2014 DET Results mAP NUS 37.2 ours, multi SPP-nets 35.1 SPP-net RCNN UvA 32.0 GPU time / img 0.6s 32s ours, 1 SPP-net 31.8 40k test imgs 8 hours 15 days Southeast-CASIA 30.4 1-HKUST 28.8 cost of a single model CASIA_CRIPAC_2 28.6 “provided data” track “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

  13. • Conclusion • SPM in CNNs • CLS: improve all CNNs in the literature • DET: practical, fast, and accurate • Future work • SPP on advanced networks • Resources • code, config, tech report… http://research.microsoft.com/en-us/um/people/kahe/ • Acknowledgement • We thank NVIDIA for the GPU donation. “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend