LOGO DETECTION WITH VGG/MOBILENET SSD
Michael Sun July – September, 2019
VGG/MOBILENET SSD Michael Sun July September, 2019 DATASETS FOR - - PowerPoint PPT Presentation
LOGO DETECTION WITH VGG/MOBILENET SSD Michael Sun July September, 2019 DATASETS FOR FINE-TUNING BelgaLogos FlickrLogos Combined LogosInTheWild DATASETS FOR FINE-TUNING (POST-CLEANING) BelgaLogos BelgaLogos-top LogosInTheWild
Michael Sun July – September, 2019
BelgaLogos FlickrLogos LogosInTheWild Combined
DATASETS FOR FINE-TUNING
DATASETS FOR FINE-TUNING (POST-CLEANING)
BelgaLogos BelgaLogos-top LogosInTheWild LogosInTheWild-top FlickrLogos
Available Instances 9844 1063 20085 6775 5968 Images 2650 688 6692 2090 2235 Classes 37 (all positive) 10 (all positive) 102 11 47
NOTE ON CLEANING
HYPERPARAMETERS
AT TRAINING TIME
AT MODEL COMPILE
FREEZING LAYERS
(BELGAS-TOP) 0.662/0.353 mAP 0.623/0.332 mAP 0.634/0.367 mAP 0.396/0.285 mAP 0.341/0.224 mAP
MOMENTUM
(LOGOSINTHEWILD) 0.013/0.008 mAP 0.001/0.0 mAP 0.019/0.007 mAP 0.033/0.022 mAP 0.043/0.012 mAP
OF THIS DATASET
TRADEOFF
LOSS FUNCTION
SSD
Classification loss (cross-entropy) Localization loss (smooth L1) Positive ground truths Negative ground truths
BATCH SIZE
0.158/0.169 mAP 0.257/0.232 mAP 0.327/0.308 mAP
CLASSIFICATION LOSS
WEIGHTING METHOD
(ALL CARS COMBINED DATASET)
ALPHA AND NEG-POS RATIO
(NBA LOGOS: KIA AND ADIDAS)
EXPERIMENTS: LR SCHEDULE
(BEST CARS DATASET: CITROEN, FERRARI, KIA, MERCEDES, AUDI, BMW) Epochs LR drops Val mAP 0, 10 5e-5, 1e-5 0, 10 2e-5, 1e-5 0.029 5e-5 0.125 3e-5 0.083 1e-5 0.019
EXPERIMENTS: MOMENTUM
(BEST CARS DATASET: CITROEN, FERRARI, KIA, MERCEDES, AUDI, BMW) Momentum (all 5e-5) Val mAP 0.9 0.7 0.045 0.5 0.02 0.3 0.0 0.1 0.015
EXPERIMENTS: NEG-POS-RATIO & ALPHA
(BEST CARS DATASET: CITROEN, FERRARI, KIA, MERCEDES, AUDI, BMW) Alpha Val mAP 1 0.115 0.9 0.145 0.8 0.169 0.7 0.168 NP Val mAP 4 0.03 5 0.115 6 0.015
EXPERIMENTS: MISC
(BEST CARS DATASET: CITROEN, FERRARI, KIA, MERCEDES, AUDI, BMW)
BATCH SIZE 32
ASPECT-RATIOS, SCALES
scales = [0.05, 0.10, 0.15, 0.2, 0.25, 0.37, 0.50] aspect_ratios = [[1.0, 1.5, 2.0/3.0], [1.0, 1.5, 2.0/3.0, 1.25, 0.8], [1.0, 1.5, 2.0/3.0, 1.25, 0.8], [1.0, 1.5, 2.0/3.0, 1.25, 0.8], [1.0, 1.5, 2.0/3.0], [1.0, 1.5, 2.0/3.0]]
scales_pascal = [0.1, 0.2, 0.37, 0.54, 0.71, 0.88, 1.05] # The anchor box scaling factors used in the original SSD300 for the Pascal VOC datasets aspect_ratios = [[1.0, 2.0, 0.5], [1.0, 2.0, 0.5, 3.0, 1.0/3.0], [1.0, 2.0, 0.5, 3.0, 1.0/3.0], [1.0, 2.0, 0.5, 3.0, 1.0/3.0], [1.0, 2.0, 0.5], [1.0, 2.0, 0.5]]
SATURATION, HUE, RANDOM CHANNEL SWAP)
AUGMENTATION PIPELINE
DOUBLE FINE-TUNING
Dataset of all cars, 16 classes, ~1500 images Dataset of cars of interest, 3 classes, ~250 images Final model All-cars model VGG- COCO SSD
0.018 val mAP; 10.08 val loss 0.021 val mAP; 9.63 val loss 0.0 val mAP; 10.43 val loss Class-diff ~0.4 AP vs 0.0 AP
WEIGHTING METHOD
(ALL CARS FINE TUNED DATASET)
0.34 mAP (0.325, 0.337, 0.358) 0.337 mAP (0.256, 0.354, 0.4) 0.34 mAP (0.327, 0.337, 0.339) (After 50 epochs)
VGG/ MNet SSD
Filtering (0.5) Non-max- suppression (NMS)
(#frames, #boxes, 6)
Top-k
(#frames, k, 6) (#frames, ?, 6) (#frames,)
TFLite graph Android
REAL-TIME INFERENCE PIPELINE
VGG/ MNet SSD Initial filtering (0.01) NMS
(#frames, #boxes, 12)
Anchor Boxes
(#frames, #boxes, 6) (#frames, ?, 6) (#frames,?, 6)
Keras and numpy
Training
TensorFlow
Top-k/ pad-k
(#frames, k, 6)
Final Filter (0.5)
User
TOP-K (TFLITE)
VGG/ MNet SSD
(#frames, #boxes, 6)
Top-k
FILTERING (TFLITE)
(ORIGINAL) IS OK
Filtering (0.5) Top-k
(#frames, k, 6)
NON-MAX-SUPPRESSION (TFLITE -> ANDROID)
Filtering (0.5) Non-max- suppression (NMS)
(#frames, ?, 6)
alpha=0.5, np=5 and alpha=1.5 np=3
with lr drops, early stopping
final, has mistakes
sent for annotations
NOTES ON DEPTHWISE
SUGGESTIONS ON IMPROVEMENT
PROGRESS ON MOBILENET V2
Conv-block 1
Conv-block 1
Conv-block 6
…
Classification
…
Localization VGG/ Mobilenet base layers ILSVRC/ImageNet COCO/VOC Conv’s/ Depthwise-separable
layers
FINAL THOUGHTS