Similarity Mapping with Enhanced Siamese Network for Multi-object Tracking
Minyoung Kim
minyoung.kim@us.panasonic.com
Similarity Mapping with Enhanced Siamese Network for Multi-object - - PowerPoint PPT Presentation
Similarity Mapping with Enhanced Siamese Network for Multi-object Tracking Minyoung Kim minyoung.kim@us.panasonic.com Panasonic Silicon Valley Lab TOWARDS AUTONOMOUS DRIVING VISUAL PERCEPTION Object Object Detection Tracking APPLICATIONS
minyoung.kim@us.panasonic.com
Object Detection Object Tracking
Risk Prediction Safety Control
2
ISSUE
à feasibility as real-world product ê PROPOSAL
ESNN-based Similarity Mapping
PARAMETERS
Feature Map
Euclidean Distance
Frame t Frame t - 1
1 2
Matching Algorithm
Frame t Frame t - 1
MATCH PAIRS 0.033 25.328 0.0223 23.242
Frame t Frame t - 1
1 2 1 2
3
Base Network Architecture
( : weight sharing )
pair data P datap conv1p pool1p conv2p pool2p conv3p conv4p conv5p fc1p fc2p featp tanh1p tanh2p tanh3p tanh4p tanh5p relup data conv1 pool1 conv2 pool2 conv3 conv4 conv5 fc1 fc2 feat tanh1 tanh2 tanh3 tanh4 tanh5 relu Contrastive Loss Base Network NB
Siamese Network
Lc = 1 2N (yn)En
2 +(1− yn)max(m − En,0)2 n=1 N
⎛ ⎝ ⎜ ⎞ ⎠ ⎟
4
Dataset Ø Market-1501* Ø 1501 identities, 32668 bounding boxes, 6 camera views Ø MOT16** Ø 7 training, 7 testing video sequences Ø split training sets into train/val
* L. Zheng at al (2015) ** A. Milan at al (2016)
5
non-matching matching
Data Pairs
Similarity with NB
Margin Precision 0.9145 Recall 0.9966 F-score 0.9538
: Non-matching pairs : Matching pairs : margin
6
( : weight sharing )
pair data P datap featp data feat Contrastive Loss
NB
DIoU DArat deconvI reluA reluI deconvA concat concatp Base Network NB
Enhanced Architecture
Temporal Information Ø Intersection over Union Ø Area Variant Ratio
Frame t Frame t + k
DIoU, Darat
[ ](bi,bj) = area(bi ∩bj)
area(bi ∪bj), min(area(bi),area(bj)) max(area(bi),area(bj)) ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥
https://motchallenge.net/vis/PETS09-S2L1/gt/
Enhanced Siamese Neural Network (ESNN)
7
Similarity with NB (left) and ESNN (right)
(x-axis: IoU , y-axis: Euclidean Distance) : Non-matching pairs : Matching pairs : margin
Precision é
(on a sample sequence from MOT16)
8
Similarity on MOT16
(x-axis: IoU , y-axis: Euclidean Distance) : Non-matching pairs : Matching pairs : margin
NB ESNN Precision 0.8187 0.9964 Recall 0.9529 0.9931 F-score 0.8807 0.9947
9
Ø two-step greedy algorithm
Ø Hungarian Algorithm (Kuhn, H. W. (1955)) Ø e.g.
Time MOT16-05 (Not crowded) 1.03x é MOT16-04 (Crowded) 2.69x é Ours Hungarian Complexity (# of objects) O(n2) O(n3) MOTA
(Multi-Object Tracking Accuracy)
35.3 27.7
10
Speed: 2.68~45.10fps
Ø Online Method Ø “solution available immediately with each incoming frame and cannot be changed at any later time” Ø Fast
* Choi, W. (2015) ** Tang, S., Andres, B., Andriluka, M., Schiele, B. (2015) *** Stiller, C., Urtasun, R., Wojek, C., Lauer, M., Geiger, A. (2014) **** Milan, A., Roth, S., Schindler, K. (2014)
* ** *** ****
Online methods MOTA Hz MDPNN16 (A. Sadeghian et al, 2017) 47.2 1.0 CDA_DDAL (S. Bae et al, 2017) 43.9 0.5 EAMTT (R. Sanchez-Matilla et al, 2016) 38.8 11.8 OVBT (Y. Ban et al, 2016) 38.4 0.3
Ø Later
( MOTA: Multi-Object Tracking Accuracy ) 11
Ø Inspired by SqueezeNet (arXiv: 1602.07360) Ø Tested on NVIDIA GTX 1080
https://www.nvidia.com/en-us/geforce/products/10series/geforce-gtx-1080/
FPS: 20~55% é (avg. 29%) MOTA: 0.9~8.2%ê (avg. 3%) Memory Usage: 70+% down (1+GB à 350+MB) Model Size: 90+% down (100+MB à 3.6MB)
SUB SET ORIGINAL SQN ESNN MOTA FPS MOTA FPS MOT16-02 17.1 21.27 15.7 26.59 MOT16-04 34.7 7.20 34.0 9.02 MOT16-05 31.0 42.01 29.6 60.55 MOT16-09 48.4 16.51 45.6 25.65 MOT16-10 31.4 23.60 31.1 28.42 MOT16-11 48.2 18.99 47.5 24.96 MOT16-13 6.8 38.04 6.6 47.02 TOTAL 30.2 16.58 29.3 21.41
12
13
https://www.nvidia.com/en-us/geforce/ products/10series/geforce-gtx-1080/
14
15
NB ESNN
Car #5 Car #5
16
NB
17
ESNN
18
Ø dependent on # of objects
Ø # of frames for keeping each tracklet data Ø the longer kept, the higher chance to be recovered when occluded Ø more ID switches with short lifetime
TDB(Fk, Oi) TDB(Fk+j, Oi) TDB(Fk+2j, Oi) TDB(Fk+3j, Oi)
for how long?
19
Object Detection Object Tracking
Risk Prediction Safety Control
20
Panasonic Silicon Valley Lab