SSD: Single Shot MultiBox Detector
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg Slides by: Sulabh Shrestha
SSD: Single Shot MultiBox Detector Wei Liu, Dragomir Anguelov, - - PowerPoint PPT Presentation
SSD: Single Shot MultiBox Detector Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg Slides by: Sulabh Shrestha Receptive Field Use multiple Ref:
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg Slides by: Sulabh Shrestha
Ref: https://cv-tricks.com/object-detection/single-shot-multibox-detector-ssd/
▪ Deep feature maps
▪ Smaller size ▪ Larger receptive fields ▪ May miss small objects
▪ Shallow feature maps
▪ Larger size ▪ Smaller receptive fields ▪ May not be able to see larger objects
▪ Use multiple for corresponding receptive field sized objects
Use multiple
▪ Base Network + Extra Feature Layer ▪ No FC layer ▪ Specific feature maps responsive to particular scale of objects
▪ Not necessarily same as the receptive field ▪ A hyper-parameter ▪ Dependent on data
8x8 Feature map 4x4 Feature map
VGG
▪VGG 16 ▪Pool5 changed:
▪ 3x3 kernel instead of 2x2 ▪ Stride 1 instead of 2
▪1st 2 FCs replaced by CNN
▪ DeepLab LargeFOV
▪Last FC removed altogether ▪No dropouts used ▪Conv4_3 also used for prediction
▪ 4th Group of Conv ▪ 3rd kernel
Ref: VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
▪Similar to Anchor boxes of Faster-RCNN ▪Example feature map:
▪m x n ▪p-channels
▪For each location (i, j)
▪ Multiple default boxes (k) ▪ 3 x 3 x p-channel CNN for each box
▪ Confidence of each class, ci ; i Є [1, C] ▪ x, y, w, h ▪ (C+4) outputs
▪ Total outputs for 1 feature map:
▪ m * n * k * (#classes + 4)
m n p
▪ How many default boxes per location? ▪ Scale
▪ Related to but not exact as the receptive field ▪ If m feature maps used for prediction: ▪ smin = 0.2 ▪ smax = 0.9 ▪ Eg.
▪ s = 0.2 ▪ img-size = 300 ▪ Default box corresponding size = 0.2 * 300 = 60
▪ Aspect ratios(ar)
▪ {1, 2, 3, 1/2, 1/3} ~ k ▪ Width (wk
a) = sk √ ar
▪ Height (hk
a) = sk / √ ar
▪ Eg.
▪ s = 0.2, img-size = 300 ▪ ar = 1
w = 0.2 * 300 = 60 h = 0.2 * 300 = 60 ▪ ar = 2
w = 0.2 * √ 2 * 300 = 85 h = 0.2 / √ 2 * 300 = 42 ▪ ar = 1/2 --> w = 0.2 * √ ½ * 300 = 42 h = 0.2 / √ ½ * 300 = 85
𝑠𝑝𝑣𝑜𝑒𝑢𝑠𝑣𝑢ℎ > 0.5 → 𝑞𝑝𝑡𝑗𝑢𝑗𝑤𝑓
Ref: https://github.com/rbgirshick/py-faster-rcnn/files/764206/SmoothL1Loss.1.pdf
PASCAL VOC2007 test detection results PASCAL VOC2012 test detection results
VOC2007 Test data
(Atrous)
Questions?