ssd single shot multibox detector
play

SSD: Single Shot MultiBox Detector Wei Liu, Dragomir Anguelov, - PowerPoint PPT Presentation

SSD: Single Shot MultiBox Detector Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg Slides by: Sulabh Shrestha Receptive Field Use multiple Ref:


  1. SSD: Single Shot MultiBox Detector Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg Slides by: Sulabh Shrestha

  2. Receptive Field Use multiple Ref: https://cv-tricks.com/object-detection/single-shot-multibox-detector-ssd/ ▪ Deep feature maps ▪ Shallow feature maps ▪ Larger size ▪ Smaller size ▪ Smaller receptive fields ▪ Larger receptive fields ▪ May not be able to see larger objects ▪ May miss small objects ▪ Use multiple for corresponding receptive field sized objects

  3. Architecture VGG ▪ Base Network + Extra Feature Layer ▪ No FC layer ▪ Specific feature maps responsive to particular scale of objects ▪ Not necessarily same as the receptive field ▪ A hyper-parameter ▪ Dependent on data 8x8 Feature map 4x4 Feature map

  4. Base Network ▪ VGG 16 ▪ Pool5 changed: ▪ 3x3 kernel instead of 2x2 ▪ Stride 1 instead of 2 ▪ 1 st 2 FCs replaced by CNN ▪ DeepLab LargeFOV ▪ Last FC removed altogether ▪ No dropouts used ▪ Conv4_3 also used for prediction ▪ 4 th Group of Conv ▪ 3 rd kernel Ref: VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

  5. Multiple Default Boxes ▪ Similar to Anchor boxes of Faster-RCNN ▪ Example feature map: ▪ m x n ▪ p-channels n ▪ For each location (i, j) ▪ Multiple default boxes ( k ) p ▪ 3 x 3 x p-channel CNN for each box m ▪ Confidence of each class, c i ; i Є [1, C] ▪ x, y, w, h ▪ (C+4) outputs ▪ Total outputs for 1 feature map: ▪ m * n * k * (#classes + 4)

  6. Scale and Aspect ratio ▪ How many default boxes per location? ▪ Scale ▪ Related to but not exact as the receptive field ▪ If m feature maps used for prediction: ▪ s min = 0.2 ▪ s max = 0.9 ▪ Eg. ▪ s = 0.2 ▪ img-size = 300 ▪ Default box corresponding size = 0.2 * 300 = 60 ▪ Aspect ratios(a r ) ▪ {1, 2, 3, 1/2, 1/3} ~ k ▪ Width (w k a ) = s k √ a r ▪ Height (h k a ) = s k / √ a r ▪ Eg. ▪ s = 0.2, img-size = 300 ▪ a r = 1 --> w = 0.2 * 300 = 60 h = 0.2 * 300 = 60 ▪ a r = 2 --> w = 0.2 * √ 2 * 300 = 85 h = 0.2 / √ 2 * 300 = 42 ▪ a r = 1/2 --> w = 0.2 * √ ½ * 300 = 42 h = 0.2 / √ ½ * 300 = 85

  7. Training • Basenet pre-trained on ImageNet CLS-LOC dataset • Fine-tuned for respective dataset • Matching Strategy 𝑕𝑠𝑝𝑣𝑜𝑒𝑢𝑠𝑣𝑢ℎ > 0.5 → 𝑞𝑝𝑡𝑗𝑢𝑗𝑤𝑓 • Any 𝐽𝑃𝑉 𝑒𝑓𝑔𝑏𝑣𝑚𝑢𝑐𝑝𝑦 • Simplifies learning problem • Can detect object in multiple overlapping default boxes • Loss • Confidence loss ( c ) • Softmax loss over multiple classes • Localization loss ( xywh ) • Smooth L1 loss • Ground truth box( g ) vs Default box( l ) Ref: https://github.com/rbgirshick/py-faster-rcnn/files/764206/SmoothL1Loss.1.pdf

  8. Results PASCAL VOC2007 test detection results PASCAL VOC2012 test detection results

  9. Inference • Filter boxes with low confidence • NMS with 0.45 IOU • Take top 200 detections • Better mAP VOC2007 Test data • Faster FPS

  10. Analysis • Better than 2 stage network: • Single network for localization and classification • Better than YOLO • Use multiple feature maps • Use many more default boxes • No FC layer • Faster inference • Fewer parameters • Smaller input size • Faster RCNN • 600 min. size • YOLO • 448 x 448

  11. Ablation Studies - 1 • Data Augmentation helps • Original image • Random sample of patch • Sample patch • IOU min is 0.1, 0.3, 0.5, 0.7, 0.9 • More Multiple boxes helps • Using FC instead of CNN (Atrous) • Similar result • 20% slow

  12. Ablation Studies - 2 • Use different number of feature maps • Similar # of default boxes to make it fair • More feature maps better • Up to a certain extent • Not using boundary defaults boxes better • Avoid default boxes lying outside the image

  13. Thank you Questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend