Embedded Multi-Person Pedestrian Tracking and Detection
MSCV19 Capstone Project, Internal(CMU)
Team Member: Yongxin Wang, Chunhui Liu Advisor: Dr. Kris Kitani 05/03/2019
Embedded Multi-Person Pedestrian Tracking and Detection MSCV19 - - PowerPoint PPT Presentation
Embedded Multi-Person Pedestrian Tracking and Detection MSCV19 Capstone Project, Internal(CMU) Team Member: Yongxin Wang, Chunhui Liu Advisor: Dr. Kris Kitani 05/03/2019 Introduction Motivation Multi-person pedestrain tracking
Team Member: Yongxin Wang, Chunhui Liu Advisor: Dr. Kris Kitani 05/03/2019
○ Multi-person pedestrain tracking ○ Real-time performance on embedded system ○ Visual analysis, automatic driving, robotics
○ Detect and track multiple people ○ Deal with new object, out-of-view objects,
○ Track by detection - SiameseRPN (Single Object) ○ Multiple object extension
2
3
Past: January:
SiamRPN March:
SiamRPN on VOT dataset April:
Align
Future:
September 15:
SiamRPN with RoI Align October 15:
October 31:
handle new objects December 15:
algorithm on NVIDIA Jetson Machine
Present:
with Region of Interest (RoI) Align
Distractor
○ Implement Train Code & Verify ○ Fintune on VOT
○ Implement Code ○ Train and Verify on VOT
○ Baseline Model ○ Multi Object Evaluatoin Code
4
5 Template Features (4, 4, 2k ⨉ 256) Image Features (20, 20, 256)
CLS Score (FG/BG) (17, 17, 2k) Bounding Box (x, y, w, h) (17, 17, 4k)
Image Feature (22, 22, 256)
AlexNet AlexNet
Template Feature (6, 6, 256)
Conv Conv Conv Conv
Template Features (4, 4, 4k ⨉ 256) Image Features (20, 20, 256) Li, Bo et al. “High Performance Visual Tracking with Siamese Region Proposal Network.” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
○ Official repository only has testing code ○ Sanity check of training process ■ Finetuned from pretrained model (trained with VID) on VOT dataset
6
Image Features (20, 20, 256) Image Features (20, 20, 256)
7 Model Pretrained Finetune Test Data EAO ↑ DaSiamRPN (Official, SOTA) YoutubeBB + ImageNet VID
0.446 SiamRPN ImageNet VID VOT 2015 (First 40 sequences) VOT 2015 (First 40 sequences) 0.5240 SiamRPN RoI ImageNet VID VOT 2015 (First 40 sequences) VOT 2015 (First 40 sequences) 0.6045 SiamRPN (with location & size penalty) ImageNet VID
0.3426 SiamRPN ImageNet VID
0.2647 SiamRPN
IP SiamRPN RoI
IP
8
Red - SiamRPN (finetuned) Blue - SiamRPN RoI (finetuned) Black - DaSiameseRPN Green - Ground Truth
9
Template Adapter (Decide how to update the templates for the next frame)
Template Features (4, 4, 2k×256) Frame Features (20, 20, 256) Template Features (4, 4, 4k×256) Cls Score (FG/BG) (17, 17, 2k) Bounding Box (x, y, w, h) (17, 17, 4k)
Templates
NMS + Data Association
Frame T (255, 255, 3)
Frame Feature (22, 22, 256)
CNN CNN
Template Feature (6, 6, 256)
Conv Conv Conv Conv
Frame Features (20, 20, 256)
○ A network that can handle several templates . ○ NMS & Data Association for matching labels . ○ Decide when to add and delete tempaltes . .
10
Template Adapter (Decide how to update the templates for the next frame)
Template Features (4, 4, 2k×256) Frame Features (20, 20, 256) Template Features (4, 4, 4k×256) Cls Score (FG/BG) (17, 17, 2k) Bounding Box (x, y, w, h) (17, 17, 4k)
Templates
NMS + Data Association
Frame T (255, 255, 3)
Frame Feature (22, 22, 256)
CNN CNN
Template Feature (6, 6, 256)
Conv Conv Conv Conv
Frame Features (20, 20, 256)
○ A network that can handle several templates . ○ NMS & Data Association for matching labels . ○ Decide when to add and delete tempaltes . .
11
○ Pre-compute correlation filters for each template ○ All templates share the RPN network to do tracking independently
○ Concatenate all correlation filters as a bigger filter ○ Re-train RPN network to perform multi-object classification
○ Add Distractor-aware loss and fine-tune RPN
12
Templates Frame T (255, 255, 3)
Frame Feature (22, 22, 256)
CNN CNN
Template Feature (n, 6, 6, 256) Template Feature (6, 6, 256) Template Features (4, 4, 2k×256) Frame Features (20, 20, 256) Template Features (4, 4, 4k×256) Cls Score (FG/BG) (17, 17, 2k) Bounding Box (x, y, w, h) (17, 17, 4k)
Conv Conv Conv Conv
Frame Features (20, 20, 256) n: number of templates k: number of anchors for each spatial pixel
13
14
Template: Template:
15
○ Pre-compute correlation filters for each template ○ All templates share the RPN network to do tracking independtly
○ Concatenate all correlation filters as a bigger filter ○ Re-train RPN network to perform multi-object classification
○ Add Distractor-aware loss and fine-tune RPN
16
Templates Frame T (255, 255, 3)
Frame Feature (22, 22, 256)
CNN CNN
Template Feature (n, 6, 6, 256) Template Feature (6, 6, 256n) Template Features (4, 4, nk×256) Frame Features (20, 20, 256) Template Features (4, 4, 4nk×256) Cls Score (FG/BG) (17, 17, (n+1)k) Bounding Box (x, y, w, h) (17, 17, 4k)
Conv Conv Conv Conv
Frame Features (20, 20, 256) n: number of templates k: number of anchors for each spatial pixel
○ Training from scratch ○ Verifying Effect of RoI
○ Try to fix Distractor Issue
17
18
○ Pre-compute correlation filters for each template ○ All templates share the RPN network to do tracking independtly
○ Concatenate all correlation filters as a bigger filter ○ Re-train RPN network to perform multi-object classification
○ Add Distractor-aware loss and fine-tune RPN
19
Templates Frame T (255, 255, 3)
Frame Feature (22, 22, 256)
CNN CNN
Template Feature (n, 6, 6, 256) Cls Score (FG/BG) (17, 17, 2k) Bounding Box (x, y, w, h) (17, 17, 4k)
RPN
Cls Score (FG/BG) (17, 17, 2k) Cls Score (FG/BG) (17, 17, 2k)
SoftMax
Cls Score (FG/BG) (17, 17, nk)
20
○ Freeze the SiamRPN, only train the Association Network ○ E.g. A fully connect network
Templates Frame T (255, 255, 3)
Frame Feature (22, 22, 256)
CNN CNN
Template Feature (n, 6, 6, 256) Cls Score (FG/BG) (17, 17, 2k) Bounding Box (x, y, w, h) (17, 17, 4k)
RPN
Cls Score (FG/BG) (17, 17, 2k) Cls Score (FG/BG) (17, 17, 2k)
Neural Network
Cls Score (FG/BG) (17, 17, nk)
21
Whole Image as Input Cropped Feature Cropped Image as Input Whole Feature
22
Object SiamRPN (September 15) ○ Achieve similar EAO as in SiamRPN paper
Align (September 15) ○ Achieve similar performance as without RoI Align
(October 15) ○ Assign correct ID to correct person
○ Learn a universal template that has high response on all pedestrians
23
○ Achieve similar EAO as in SiamRPN paper
○ Achieve similar performance as without RoI Align
Sep 15 Oct 31 Nov 15 Dec 15 Oct 15
24
○ Achieve similar EAO as in SiamRPN paper
○ Achieve similar performance as without RoI Align
○ Assign correct ID to correct person
Sep 15 Oct 31 Nov 15 Dec 15 Oct 15
25
○ Achieve similar EAO as in SiamRPN paper
○ Achieve similar performance as without RoI Align
○ Assign correct ID to correct person
○ Learn a universal template that has high response on all pedestrians
Sep 15 Oct 31 Nov 15 Dec 15 Oct 15
26
○ Achieve similar EAO as in SiamRPN paper
○ Achieve similar performance as without RoI Align
○ Assign correct ID to correct person
○ Learn a universal template that has high response on all pedestrians
○ Real-time performance on Nvidia Jeston tx2.
Sep 15 Oct 31 Nov 15 Dec 15 Oct 15
27