similarity mapping with enhanced siamese network for
play

Similarity Mapping with Enhanced Siamese Network for Multi-object - PowerPoint PPT Presentation

Similarity Mapping with Enhanced Siamese Network for Multi-object Tracking Minyoung Kim minyoung.kim@us.panasonic.com Panasonic Silicon Valley Lab TOWARDS AUTONOMOUS DRIVING VISUAL PERCEPTION Object Object Detection Tracking APPLICATIONS


  1. Similarity Mapping with Enhanced Siamese Network for Multi-object Tracking Minyoung Kim minyoung.kim@us.panasonic.com Panasonic Silicon Valley Lab

  2. TOWARDS AUTONOMOUS DRIVING VISUAL PERCEPTION Object Object Detection Tracking APPLICATIONS Risk Safety Prediction Control 2

  3. MULTI-OBJECT TRACKING PROPOSAL ISSUE • Enhanced Siamese Network • Large number of hyper parameters • Appearance + Temporal Info. • High complexity (Low speed) • Efficient Matching Algorithm à feasibility as real-world product ê ESNN-based Similarity Mapping Matching Algorithm Frame t - 1 Frame t - 1 Frame t 2 2 1 1 MATCH Frame t - 1 PAIRS PARAMETERS 0.033 25.328 Frame t 23.242 0.0223 Frame t ( , ) Feature Map 2 Euclidean Distance 3 1

  4. SIMILARITY MAPPING R. Hadsell et al (2005) Base Network Architecture ⎛ N ⎞ L c = 1 2 + (1 − y n )max( m − E n ,0) 2 ∑ ( y n ) E n ⎜ ⎟ 2 N ⎝ ⎠ n = 1 Base Network N B tanh1 p tanh2 p tanh3 p tanh4 p tanh5 p relu p conv1 p conv2 p conv3 p conv4 p conv5 p pool1 p pool2 p data p feat p fc1 p fc2 p Contrastive Loss pair data P conv1 conv2 conv3 conv4 conv5 pool1 pool2 data feat fc1 fc2 ( : weight sharing ) tanh1 tanh2 tanh3 tanh4 tanh5 relu Siamese Network 4

  5. SIMILARITY MAPPING Dataset Ø Market-1501 * Ø 1501 identities, 32668 bounding boxes, 6 camera views Ø MOT16 ** Ø 7 training, 7 testing video sequences Ø split training sets into train/val * L. Zheng at al (2015) ** A. Milan at al (2016) 5

  6. SIMILARITY MAPPING Similarity with N B Margin Precision 0.9145 Recall 0.9966 F-score 0.9538 : Non-matching pairs : Matching pairs : margin non-matching matching Data Pairs 6

  7. SIMILARITY MAPPING Temporal Information Enhanced Architecture Ø Intersection over Union Ø Area Variant Ratio Enhanced Siamese Neural Network (ESNN) D IoU D Arat deconv I deconv A relu A relu I ( : weight sharing ) Frame t data p feat p concat p Contrastive Loss pair data P N B data feat concat Frame t + k Base Network N B ⎡ ⎤ https://motchallenge.net/vis/PETS09-S2L1/gt/ ] ( b i , b j ) = area ( b i ∩ b j ) area ( b i ∪ b j ), min( area ( b i ), area ( b j )) [ D IoU , D arat ⎢ ⎥ max( area ( b i ), area ( b j )) 7 ⎢ ⎥ ⎣ ⎦

  8. SIMILARITY MAPPING Similarity with N B ( left ) and ESNN ( right ) IS IoU ENOUGH? (on a sample sequence from MOT16) ( x-axis : IoU , y-axis : Euclidean Distance) Precision é : Non-matching pairs : Matching pairs : margin 8

  9. SIMILARITY MAPPING Similarity on MOT16 N B ESNN Precision 0.8187 0.9964 Recall 0.9529 0.9931 F-score 0.8807 0.9947 ( x-axis : IoU , y-axis : Euclidean Distance) : Non-matching pairs : Matching pairs : margin 9

  10. MATCHING ALGORITHM A Simple Matching Algorithm Ø Heuristic Ø Computationally efficient Ø two-step greedy algorithm Ø Hungarian Algorithm (Kuhn, H. W. (1955)) Ø e.g. Time MOT16-05 1.03x é (Not crowded) MOT16-04 2.69x é (Crowded) Ø Better MOTA Ours Hungarian Complexity (# of objects) O(n 2 ) O(n 3 ) MOTA 35.3 27.7 (Multi-Object Tracking Accuracy) 10

  11. EVALUATION Ø Online Method Ø “solution available immediately with each incoming frame and cannot be changed at any later time” * Ø Fast ** Speed: 2.68~45.10 fps *** on each video sequence **** ( MOTA: Multi-Object Tracking Accuracy ) * Choi, W. (2015) ** Tang, S., Andres, B., Andriluka, M., Schiele, B. (2015) *** Stiller, C., Urtasun, R., Wojek, C., Lauer, M., Geiger, A. (2014) **** Milan, A., Roth, S., Schindler, K. (2014) Ø Later Online methods MOTA Hz MDPNN16 (A. Sadeghian et al, 2017) 47.2 1.0 CDA_DDAL (S. Bae et al, 2017) 43.9 0.5 EAMTT (R. Sanchez-Matilla et al, 2016) 38.8 11.8 OVBT (Y. Ban et al, 2016) 38.4 0.3 11

  12. MODEL COMPRESSION Ø Inspired by SqueezeNet (arXiv: 1602.07360) Ø Tested on NVIDIA GTX 1080 ORIGINAL SQN ESNN SUB SET MOTA FPS MOTA FPS MOT16-02 17.1 21.27 15.7 26.59 MOT16-04 34.7 7.20 34.0 9.02 MOT16-05 31.0 42.01 29.6 60.55 MOT16-09 48.4 16.51 45.6 25.65 MOT16-10 31.4 23.60 31.1 28.42 MOT16-11 48.2 18.99 47.5 24.96 MOT16-13 6.8 38.04 6.6 47.02 TOTAL 30.2 16.58 29.3 21.41 FPS: 20~55% é (avg. 29%) MOTA: 0.9~8.2% ê (avg. 3%) Memory Usage: 70+% down (1+GB à 350+MB) Model Size: 90+% down (100+MB à 3.6MB) 12 https://www.nvidia.com/en-us/geforce/products/10series/geforce-gtx-1080/

  13. MULTI-OBJECT DETECTION & TRACKING Object & Object Detection Tracking 13

  14. PSVL MOD https://www.nvidia.com/en-us/geforce/ products/10series/geforce-gtx-1080/ 14

  15. PSVL MOD + MOT 15

  16. SIMILARITY MAPPING COMPARISON AGAIN N B Car #5 ESNN Car #5 16

  17. SIMILARITY MAPPING COMPARISON AGAIN N B 17

  18. SIMILARITY MAPPING COMPARISON AGAIN ESNN 18

  19. LIMITATION Ø Speed Ø dependent on # of objects TDB(F k , O i ) Ø Hyper Parameter TDB(F k+j , O i ) Ø Lifetime of Tracklet Ø # of frames for keeping each tracklet data TDB(F k+2j , O i ) Ø the longer kept, the higher chance to be recovered when occluded Ø more ID switches with short lifetime TDB(F k+3j , O i ) for how long? 19

  20. CONCLUSION VISUAL PERCEPTION Object Object Detection Tracking Unsupervised Learning APPLICATIONS Risk Safety Prediction Control 20

  21. Thank you! Panasonic Silicon Valley Lab

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend