Similarity Mapping with Enhanced Siamese Network for Multi-object - - PowerPoint PPT Presentation

similarity mapping with enhanced siamese network for
SMART_READER_LITE
LIVE PREVIEW

Similarity Mapping with Enhanced Siamese Network for Multi-object - - PowerPoint PPT Presentation

Similarity Mapping with Enhanced Siamese Network for Multi-object Tracking Minyoung Kim minyoung.kim@us.panasonic.com Panasonic Silicon Valley Lab TOWARDS AUTONOMOUS DRIVING VISUAL PERCEPTION Object Object Detection Tracking APPLICATIONS


slide-1
SLIDE 1

Similarity Mapping with Enhanced Siamese Network for Multi-object Tracking

Minyoung Kim

minyoung.kim@us.panasonic.com

Panasonic Silicon Valley Lab

slide-2
SLIDE 2

VISUAL PERCEPTION TOWARDS AUTONOMOUS DRIVING

Object Detection Object Tracking

APPLICATIONS

Risk Prediction Safety Control

2

slide-3
SLIDE 3

MULTI-OBJECT TRACKING

ISSUE

  • Large number of hyper parameters
  • High complexity (Low speed)

à feasibility as real-world product ê PROPOSAL

  • Enhanced Siamese Network
  • Appearance + Temporal Info.
  • Efficient Matching Algorithm

ESNN-based Similarity Mapping

PARAMETERS

Feature Map

Euclidean Distance

Frame t Frame t - 1

1 2

Matching Algorithm

Frame t Frame t - 1

( , )

MATCH PAIRS 0.033 25.328 0.0223 23.242

Frame t Frame t - 1

1 2 1 2

3

slide-4
SLIDE 4

SIMILARITY MAPPING

Base Network Architecture

( : weight sharing )

pair data P datap conv1p pool1p conv2p pool2p conv3p conv4p conv5p fc1p fc2p featp tanh1p tanh2p tanh3p tanh4p tanh5p relup data conv1 pool1 conv2 pool2 conv3 conv4 conv5 fc1 fc2 feat tanh1 tanh2 tanh3 tanh4 tanh5 relu Contrastive Loss Base Network NB

Siamese Network

Lc = 1 2N (yn)En

2 +(1− yn)max(m − En,0)2 n=1 N

⎛ ⎝ ⎜ ⎞ ⎠ ⎟

  • R. Hadsell et al (2005)

4

slide-5
SLIDE 5

SIMILARITY MAPPING

Dataset Ø Market-1501* Ø 1501 identities, 32668 bounding boxes, 6 camera views Ø MOT16** Ø 7 training, 7 testing video sequences Ø split training sets into train/val

* L. Zheng at al (2015) ** A. Milan at al (2016)

5

slide-6
SLIDE 6

non-matching matching

Data Pairs

SIMILARITY MAPPING

Similarity with NB

Margin Precision 0.9145 Recall 0.9966 F-score 0.9538

: Non-matching pairs : Matching pairs : margin

6

slide-7
SLIDE 7

( : weight sharing )

pair data P datap featp data feat Contrastive Loss

NB

DIoU DArat deconvI reluA reluI deconvA concat concatp Base Network NB

SIMILARITY MAPPING

Enhanced Architecture

Temporal Information Ø Intersection over Union Ø Area Variant Ratio

Frame t Frame t + k

DIoU, Darat

[ ](bi,bj) = area(bi ∩bj)

area(bi ∪bj), min(area(bi),area(bj)) max(area(bi),area(bj)) ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥

https://motchallenge.net/vis/PETS09-S2L1/gt/

Enhanced Siamese Neural Network (ESNN)

7

slide-8
SLIDE 8

SIMILARITY MAPPING

Similarity with NB (left) and ESNN (right)

IS IoU ENOUGH?

(x-axis: IoU , y-axis: Euclidean Distance) : Non-matching pairs : Matching pairs : margin

Precision é

(on a sample sequence from MOT16)

8

slide-9
SLIDE 9

SIMILARITY MAPPING

Similarity on MOT16

(x-axis: IoU , y-axis: Euclidean Distance) : Non-matching pairs : Matching pairs : margin

NB ESNN Precision 0.8187 0.9964 Recall 0.9529 0.9931 F-score 0.8807 0.9947

9

slide-10
SLIDE 10

MATCHING ALGORITHM A Simple Matching Algorithm Ø Heuristic

Ø two-step greedy algorithm

Ø Computationally efficient

Ø Hungarian Algorithm (Kuhn, H. W. (1955)) Ø e.g.

Ø Better MOTA

Time MOT16-05 (Not crowded) 1.03x é MOT16-04 (Crowded) 2.69x é Ours Hungarian Complexity (# of objects) O(n2) O(n3) MOTA

(Multi-Object Tracking Accuracy)

35.3 27.7

10

slide-11
SLIDE 11

EVALUATION

Speed: 2.68~45.10fps

  • n each video sequence

Ø Online Method Ø “solution available immediately with each incoming frame and cannot be changed at any later time” Ø Fast

* Choi, W. (2015) ** Tang, S., Andres, B., Andriluka, M., Schiele, B. (2015) *** Stiller, C., Urtasun, R., Wojek, C., Lauer, M., Geiger, A. (2014) **** Milan, A., Roth, S., Schindler, K. (2014)

* ** *** ****

Online methods MOTA Hz MDPNN16 (A. Sadeghian et al, 2017) 47.2 1.0 CDA_DDAL (S. Bae et al, 2017) 43.9 0.5 EAMTT (R. Sanchez-Matilla et al, 2016) 38.8 11.8 OVBT (Y. Ban et al, 2016) 38.4 0.3

Ø Later

( MOTA: Multi-Object Tracking Accuracy ) 11

slide-12
SLIDE 12

MODEL COMPRESSION

Ø Inspired by SqueezeNet (arXiv: 1602.07360) Ø Tested on NVIDIA GTX 1080

https://www.nvidia.com/en-us/geforce/products/10series/geforce-gtx-1080/

FPS: 20~55% é (avg. 29%) MOTA: 0.9~8.2%ê (avg. 3%) Memory Usage: 70+% down (1+GB à 350+MB) Model Size: 90+% down (100+MB à 3.6MB)

SUB SET ORIGINAL SQN ESNN MOTA FPS MOTA FPS MOT16-02 17.1 21.27 15.7 26.59 MOT16-04 34.7 7.20 34.0 9.02 MOT16-05 31.0 42.01 29.6 60.55 MOT16-09 48.4 16.51 45.6 25.65 MOT16-10 31.4 23.60 31.1 28.42 MOT16-11 48.2 18.99 47.5 24.96 MOT16-13 6.8 38.04 6.6 47.02 TOTAL 30.2 16.58 29.3 21.41

12

slide-13
SLIDE 13

MULTI-OBJECT DETECTION & TRACKING

&

Object Detection Object Tracking

13

slide-14
SLIDE 14

PSVL MOD

https://www.nvidia.com/en-us/geforce/ products/10series/geforce-gtx-1080/

14

slide-15
SLIDE 15

PSVL MOD + MOT

15

slide-16
SLIDE 16

SIMILARITY MAPPING COMPARISON AGAIN

NB ESNN

Car #5 Car #5

16

slide-17
SLIDE 17

SIMILARITY MAPPING COMPARISON AGAIN

NB

17

slide-18
SLIDE 18

SIMILARITY MAPPING COMPARISON AGAIN

ESNN

18

slide-19
SLIDE 19

Ø Speed

Ø dependent on # of objects

Ø Hyper Parameter Ø Lifetime of Tracklet

Ø # of frames for keeping each tracklet data Ø the longer kept, the higher chance to be recovered when occluded Ø more ID switches with short lifetime

LIMITATION

TDB(Fk, Oi) TDB(Fk+j, Oi) TDB(Fk+2j, Oi) TDB(Fk+3j, Oi)

for how long?

19

slide-20
SLIDE 20

VISUAL PERCEPTION CONCLUSION

Object Detection Object Tracking

APPLICATIONS

Risk Prediction Safety Control

Unsupervised Learning

20

slide-21
SLIDE 21

Thank you!

Panasonic Silicon Valley Lab