vgg mobilenet ssd
play

VGG/MOBILENET SSD Michael Sun July September, 2019 DATASETS FOR - PowerPoint PPT Presentation

LOGO DETECTION WITH VGG/MOBILENET SSD Michael Sun July September, 2019 DATASETS FOR FINE-TUNING BelgaLogos FlickrLogos Combined LogosInTheWild DATASETS FOR FINE-TUNING (POST-CLEANING) BelgaLogos BelgaLogos-top LogosInTheWild


  1. LOGO DETECTION WITH VGG/MOBILENET SSD Michael Sun July – September, 2019

  2. DATASETS FOR FINE-TUNING BelgaLogos FlickrLogos Combined LogosInTheWild

  3. DATASETS FOR FINE-TUNING (POST-CLEANING) BelgaLogos BelgaLogos-top LogosInTheWild LogosInTheWild-top FlickrLogos Available 9844 1063 20085 6775 5968 Instances Images 2650 688 6692 2090 2235 Classes 37 (all positive) 10 (all positive) 102 11 47

  4. NOTE ON CLEANING • A DDRESSING CLASS IMBALANCE ( KL - DIVERGENCE ) S CRIPT TO RANDOMLY SAMPLE INSTANCES • • N EGATIVE SAMPLES ( POS - NEG RATIO , NEGATIVE MINING )

  5. HYPERPARAMETERS A T T RAINING T IME A T M ODEL C OMPILE L EARNING RATE SCHEDULE (E POCHS , R ATES ) L AYER FREEZING • • B ATCH SIZE L OSS FUNCTION • • M OMENTUM I MAGE DIM , S CALES , AR, MISC . • • A UGMENTATION P IPELINE •

  6. FREEZING LAYERS (BELGAS-TOP) 0.662/0.353 mAP 0.623/0.332 mAP 0.634/0.367 mAP 0.396/0.285 mAP 0.341/0.224 mAP

  7. MOMENTUM (LOGOSINTHEWILD) 0.013/0.008 mAP 0.001/0.0 mAP 0.019/0.007 mAP • I NHERENT DIFFICULTY OF THIS DATASET • B IAS - VARIANCE TRADEOFF 0.033/0.022 mAP 0.043/0.012 mAP

  8. LOSS FUNCTION Classification loss (cross-entropy) SSD Positive ground truths Localization loss (smooth L1) Negative ground truths • T OTAL LOSS = LOCALIZATION + ALPHA * CLASSIFICATION • L OCALIZATION = SUM _( POS BOXES ) SMOOTH -L1 + SUM _( NEG BOXES ) SMOOTH -L1 NEG _ POS _ RATIO AND N _ NEG _ MIN • • D EFAULT : ALPHA =1, NP =3, N _ NEG _ MIN =0

  9. BATCH SIZE 0.158/0.169 mAP 0.257/0.232 mAP 0.327/0.308 mAP • B IAS OF EACH MODEL • E XPLANATION OF LOSS SPIKES AT LR DROPS

  10. CLASSIFICATION LOSS • (M ETHOD 1, ORIGINAL REPO ) R EGULAR CROSS - ENTROPY • (M ETHOD 2) W EIGH EACH CLASS C BY ( TOTAL - # SAMPLES IN C) • (M ETHOD 3) M ETHOD 2 BUT ONLY FOR POSITIVE CLASSES

  11. WEIGHTING METHOD (ALL CARS COMBINED DATASET)

  12. ALPHA AND NEG-POS RATIO (NBA LOGOS: KIA AND ADIDAS)

  13. EXPERIMENTS: LR SCHEDULE (BEST CARS DATASET: CITROEN, FERRARI, KIA, MERCEDES, AUDI, BMW) Epochs LR drops Val mAP 0, 10 5e-5, 1e-5 0, 10 2e-5, 1e-5 0.029 0 5e-5 0.125 0 3e-5 0.083 0 1e-5 0.019

  14. EXPERIMENTS: MOMENTUM (BEST CARS DATASET: CITROEN, FERRARI, KIA, MERCEDES, AUDI, BMW) Momentum (all 5e-5) Val mAP 0.9 0.7 0.045 0.5 0.02 0.3 0.0 0.1 0.015

  15. EXPERIMENTS: NEG-POS-RATIO & ALPHA (BEST CARS DATASET: CITROEN, FERRARI, KIA, MERCEDES, AUDI, BMW) Alpha Val mAP 1 0.115 0.9 0.145 0.8 0.169 0.7 0.168 NP Val mAP 4 0.03 5 0.115 6 0.015

  16. EXPERIMENTS: MISC (BEST CARS DATASET: CITROEN, FERRARI, KIA, MERCEDES, AUDI, BMW) • C ONF THRESH ( DEFAULT 1 E -2); 1 E -3 GOT 0.11 VAL M AP • I MG DIM (300, 350, 400, …); 400 DID WORSE ON VAL ; > 400 GPU CAN ’ T HANDLE WITH BATCH SIZE 32 • M OMENTUM SCHEDULER ; TRIED ONCE , COULDN ’ T CONVERGE • X AVIER - INITIALIZE CLASS / LOC LAYERS GOT 0.132 VAL M AP

  17. ASPECT-RATIOS, SCALES • D EFAULT scales_pascal = [0.1, 0.2, 0.37, 0.54, 0.71, 0.88, 1.05] # The anchor box scaling factors used in the original SSD300 for the Pascal VOC datasets aspect_ratios = [[1.0, 2.0, 0.5], [1.0, 2.0, 0.5, 3.0, 1.0/3.0], [1.0, 2.0, 0.5, 3.0, 1.0/3.0], [1.0, 2.0, 0.5, 3.0, 1.0/3.0], [1.0, 2.0, 0.5], [1.0, 2.0, 0.5]] • A FTER ANALYSIS scales = [0.05, 0.10, 0.15, 0.2, 0.25, 0.37, 0.50] aspect_ratios = [[1.0, 1.5, 2.0/3.0], [1.0, 1.5, 2.0/3.0, 1.25, 0.8], [1.0, 1.5, 2.0/3.0, 1.25, 0.8], [1.0, 1.5, 2.0/3.0, 1.25, 0.8], [1.0, 1.5, 2.0/3.0], [1.0, 1.5, 2.0/3.0]]

  18. AUGMENTATION PIPELINE • P HOTOMETRIC ( BRIGHTNESS , CONTRAST , • O RDER OF TRANSFORMATIONS SATURATION , HUE , RANDOM CHANNEL • M IN - SCALE , MAX - SCALE SWAP ) • M IN - SCALE , MAX - SCALE • E XPAND • F REQUENCY • R ANDOM C ROP • I NTERPOLATION • R ANDOM F LIP • R ESIZE

  19. DOUBLE FINE-TUNING Dataset of all cars, 16 classes, ~1500 images Dataset of cars of VGG- interest, 3 classes, All-cars COCO ~250 images model SSD Final model 0.018 val mAP; 10.08 val loss 0.021 val mAP; 9.63 val loss 0.0 val mAP; 10.43 val loss Class-diff ~0.4 AP vs 0.0 AP

  20. WEIGHTING METHOD (ALL CARS FINE TUNED DATASET) W HY I TUNED THIS HYPERPARAMETER • 0.34 mAP 0.337 mAP 0.34 mAP (0.325, 0.337, 0.358) (0.256, 0.354, 0.4) (0.327, 0.337, 0.339) (After 50 epochs)

  21. REAL-TIME INFERENCE PIPELINE Training VGG/ Initial Final Anchor Top-k/ (#frames, (#frames, (#frames, (#frames, (#frames,?, MNet filtering NMS Filter #boxes, pad-k Boxes ?, 6) k, 6) #boxes, 6) 6) SSD 12) (0.01) (0.5) Keras and numpy TensorFlow User VGG/ Non-max- Filtering (#frames, MNet suppression Top-k (#frames, k, 6) (#frames, ?, 6) (#frames,) (0.5) #boxes, 6) (NMS) SSD TFLite graph Android

  22. TOP-K (TFLITE) VGG/ • IN: (1, 8732, 6); OUT: (1, 20, 6) (#frames, MNet Top-k #boxes, 6) SSD

  23. FILTERING (TFLITE) • IN: (1, 20, 6); OUT: (1, ?, 6) Filtering Top-k (#frames, k, 6) (0.5) • W HY 0.5 ( OURS ) VS 0.01 THRESHOLD ( ORIGINAL ) IS OK

  24. NON-MAX-SUPPRESSION (TFLITE -> ANDROID) • IN: (1, ?, 6); OUT: (1, ?, 6) • L IMIT TO K =10 W HILE STILL BOXES LEFT & WE HAVE LESS THAN K • A DD HIGHEST CONFIDENCE BOX B • Non-max- Filtering suppression (#frames, ?, 6) F OR EACH REMAINING BOX • (0.5) (NMS) C OMPUTE IOU WITH B • I F > 0.5 THEN REMOVE •

  25. Combination of • alpha=0.5, np=5 and alpha=1.5 np=3 A lot of trial-and-error • with lr drops, early stopping

  26. (Left) Candidate for • final, has mistakes (Bottom) Final video • sent for annotations

  27. NOTES ON DEPTHWISE • F INAL 2 CONVS • F INAL 4 CONVS (~5 SECONDS / FRAME ) • F INAL 6 CONVS • A LL CLASS LAYERS • A LL LOC LAYERS • F IRST 2 LAYERS 0 M AP

  28. SUGGESTIONS ON IMPROVEMENT • C URRENTLY FRAME - BY - FRAME TAKES 10 S / FRAME BATCH SIZE INFERENCE – FRAME RATE / DELAY TRADEOFF • • (O PTIMISTIC ) – S EQUENCE MODELLING

  29. PROGRESS ON MOBILENET V2 ILSVRC/ImageNet COCO/VOC VGG/ … Conv-block 1 Mobilenet Conv-block 1 … Conv-block 6 base layers Base layers • Conv’s/ Depthwise-separable Conv blocks • Classification/localization • layers Classification Localization

  30. FINAL THOUGHTS • D IFFICULTY OF NOT HAVING FIXED DATASET , HAVING NOISY DATA • T HANKS TO U TKARSH AND G AURAV

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend