3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng - - PowerPoint PPT Presentation

ā–¶
3d multi object tracking for autonomous driving
SMART_READER_LITE
LIVE PREVIEW

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng - - PowerPoint PPT Presentation

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng Robotics Institute, Carnegie Mellon University RI PhD Speaking Qualifier September 24, 2020 Committee Member: Kris Kitani (advisor), Martial Hebert, David Held, Peiyun Hu 1 3D


slide-1
SLIDE 1

3D Multi-Object Tracking for Autonomous Driving

Xinshuo Weng Robotics Institute, Carnegie Mellon University RI PhD Speaking Qualifier

September 24, 2020 Committee Member: Kris Kitani (advisor), Martial Hebert, David Held, Peiyun Hu

1

slide-2
SLIDE 2

2

3D multi-object tracking is an important perception task for autonomous driving

slide-3
SLIDE 3

Standard 3D MOT Pipeline

3

3D Object Detection Data Association

Evaluation Sensor Data

slide-4
SLIDE 4

Standard 3D MOT Pipeline

4

3D Object Detection Data Association

Evaluation Sensor Data

LiDAR RGB

slide-5
SLIDE 5

Standard 3D MOT Pipeline

5

3D Object Detection Data Association

Evaluation Sensor Data

Detection results

slide-6
SLIDE 6

Standard 3D MOT Pipeline

6

3D Object Detection Data Association

Evaluation Sensor Data

Tracking results

slide-7
SLIDE 7

Standard 3D MOT Pipeline

7

Also important!

3D Object Detection Data Association

Evaluation Sensor Data

Evaluation:

  • 1. MOTA: MOT accuracy
  • 2. MOTP: MOT precision
  • 3. IDS: # of identity switches
  • 4. FRAG: # of trajectory

fragments

  • 5. ……
slide-8
SLIDE 8

Standard 3D MOT Pipeline

8

3D Object Detection Data Association

Evaluation Sensor Data

slide-9
SLIDE 9

9

What is the state of the art?

slide-10
SLIDE 10

State of the Art (3D MOT)

10

3D Object Detection Data Association

Evaluation Sensor Data

Better models from better (bigger) data!

* Mined trajectory data not counted for the Argo dataset

*

150x increase!

slide-11
SLIDE 11

State of the Art (3D MOT)

11

Image credit to Patrick Langechuan Liu, https://towardsdatascience.com/monocular-3d-object-detection-in-autonomous-driving-2476a3c7f57e

AP

3D Object Detection Data Association

Evaluation Sensor Data

15x increase (3 years)

Monocular 3D Detection (KITTI)

slide-12
SLIDE 12

State of the Art (3D MOT)

12

3D Object Detection Data Association

Evaluation Sensor Data

27% increase (2 years)

Lidar-based 3D Detection (KITTI)

slide-13
SLIDE 13

State of the Art (3D MOT)

13

3D Object Detection Data Association

Evaluation Sensor Data

18% increase (5 years)

2D MOT (KITTI) *3D methods compared using 2D evaluation on KITTI

slide-14
SLIDE 14

State of the Art (3D MOT)

14

3D Object Detection Data Association

Evaluation Sensor Data

Feature Extraction Optimization

  • D. Frossard R. Urtasun.

End-to-End Learning of Multi-Sensor 3D Tracking by Detection. ICRA 2018. Zhang et al. Robust Multi-Modality Multi-Object Tracking. ICCV 2019.

Jointly optimized

Recent trend:

slide-15
SLIDE 15

State of the Art (3D MOT)

15

3D Object Detection Evaluation

Sensor Data

Feature Extraction Optimization

What are open problems in 3D MOT?

slide-16
SLIDE 16

Some Open Problems (3D MOT)

16

3D Object Detection Evaluation

Sensor Data

Feature Extraction Optimization

Many large-scale datasets but sensor suite and annotations are not unified 3D detection performance is improving but doesn't take into account sensor physics Doesn't take into account context of multi- level optimization problem (sensors, forecasting, control) Representation doesn't take into account context of other objects and the scene Weak 3D MOT evaluation datasets and metrics Should also take into account sensor

  • ptimization and redundancy

Detection and tracking should be coupled more tightly This talk This talk

slide-17
SLIDE 17

Recent Work on Evaluation

17

slide-18
SLIDE 18

What are the Issues of 3D MOT Evaluation?

  • Matching criteria: IoU (intersection of union)
  • For the pioneering 3D MOT dataset KITTI, evaluation is performed in the 2D space
  • IoU is computed on the 2D image plane (not 3D)
  • The common practice for evaluating 3D MOT methods is:
  • Project 3D trajectories onto the image plane
  • Run the 2D evaluation code provided by KITTI

18

IoU in 2D space

Image credit to Xu et al: 3D-GIoU

IoU in 3D space

Bp: the predicted box Bg: the ground truth box Bc: the smallest enclosing box I2D, I3D: the intersection

slide-19
SLIDE 19

What are the Issues of 3D MOT Evaluation?

  • Why is it not good to evaluate 3D MOT methods in the 2D space?
  • Cannot measure the strength of 3D MOT methods
  • Estimated 3D information: depth value, object dimensionality (length, height and width), heading orientation
  • Cannot fairly compare 3D MOT methods, why?
  • Not penalized by the wrong predicted depth value, length, heading as long as the 2D projection is accurate
  • Which predicted box is better, blue or green?
  • Conclusion: should not evaluate 3D MOT methods in the 2D space

19

C

Blue: the predicted box 1 Green: the predicted box 2 Red: the ground truth box

  • X. Weng, J. Wang, D. Held, K. Kitani. 3D Multi-Object Tracking: A Baseline and New Evaluation Metrics. IROS 2020.
slide-20
SLIDE 20

Our Solution: Upgrade the Matching Criteria to 3D

20

  • Replace the matching criteria (2D IoU) in the KITTI evaluation code with 3D IoU
  • https://github.com/xinshuoweng/AB3DMOT (900 stars)
  • Work with nuTonomy collaborators and use our 3D MOT evaluation metrics in the

nuScenes evaluation with the matching criteria of center distance

Our released new evaluation code nuScenes 3D MOT evaluation with our metrics

  • X. Weng, J. Wang, D. Held, K. Kitani. 3D Multi-Object Tracking: A Baseline and New Evaluation Metrics. IROS 2020.
slide-21
SLIDE 21

What are the Issues of Evaluation?

  • Are we done with the evaluation? Can we further

improve the current metrics?

  • E.g., MOTA (multi-object tracking accuracy)
  • š‘š‘ƒš‘ˆšµ = 1 āˆ’ !" #!$#%&'

()*!"

  • Performance is measured at a single recall point

21

MOTA over Recall curve

  • X. Weng, J. Wang, D. Held, K. Kitani. 3D Multi-Object Tracking: A Baseline and New Evaluation Metrics. IROS 2020.
slide-22
SLIDE 22

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3D MOT system 1 3D MOT system 2

MOTA Recall

What are the Issues of Evaluation?

  • Why is it not good to evaluate at a single recall point?
  • Consequences
  • The confidence threshold needs to be carefully tuned, requiring non-trivial effort
  • Sensitive to different detectors, different dataset, different object categories
  • Cannot understand the full spectrum of accuracy of a MOT system
  • Which MOT system is better, blue or orange?
  • The orange one has higher MOTA at its best recall point (r = 0.9)
  • The blue one has overall higher MOTA at many recall points
  • Ideally, we want as high performance as possible at all recall points

22

MOTA over Recall curve

slide-23
SLIDE 23

Our Solution: Integral Metrics

  • MOTA is measured at a single point on the curve
  • What can we do to improve the evaluation metrics?
  • Compute the integral metrics through the area under the

curve, e.g., average MOTA (AMOTA)

  • Analogous to the average precision (AP) in object detection
  • Can measure the full spectrum of MOT accuracy

23

MOTA over Recall curve

Area under the curve

  • X. Weng, J. Wang, D. Held, K. Kitani. 3D Multi-Object Tracking: A Baseline and New Evaluation Metrics. IROS 2020.
slide-24
SLIDE 24

Recent Work on Improve Feature Learning for 3D MOT

24

slide-25
SLIDE 25

What are the Issues of Feature Learning?

25

  • X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020.
  • Goal: learn discriminative features for different objects
  • Issues in the feature learning?
  • Feature extraction for each object is independent of other objects
  • Why not good? No communication between objects, ignoring the context information
  • Employ feature from one or two modalities
  • E.g., 2D appearance, or 2D motion, or 3D motion, or 3D appearance
  • Why not good? Not utilize all the information that is complementary

2D (or 3D) feature extractor 2D (or 3D) feature extractor Objects in frame t Objects in frame t+1 frame t frame t+1 Hungarian algorithm Affinity matrix

Pipeline from Prior work

3D Object Detector Feature Extractor

Optimizer Evaluation

Sensor Data

slide-26
SLIDE 26

Improve Feature Learning for 3D MOT

26

  • X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020.
  • How can we address these two issues?
  • Shouldn’t features depend on the context of other objects?
  • Propose a novel feature interaction mechanism
  • How can we utilize the information from all the modalities?
  • Extract multi-modal features that are complimentary to each other
  • i.e., 2D motion + 2D appearance + 3D motion + 3D appearance

2D + 3D feature extractor 2D + 3D feature extractor

Feature interaction

Objects in frame t+1 Objects in frame t frame t frame t+1 Hungarian algorithm Affinity matrix

Pipeline from Our work

Iteratively

3D Object Detector Feature Extractor

Optimizer Evaluation

Sensor Data

slide-27
SLIDE 27

Improve Feature Learning for 3D MOT

27

  • X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020.
  • How do we do?
  • (a) Obtain the appearance / motion features from the 3D point cloud
  • LSTM for 3D motion from 3D box trajectories
  • PointNet for 3D appearance from point cloud

3D Object Detector Feature Extractor

Optimizer Evaluation

Sensor Data

slide-28
SLIDE 28

Improve Feature Learning for 3D MOT

28

  • X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020.
  • How do we do?
  • (b) Obtain the appearance / motion features from the 2D image
  • LSTM for 2D motion from 2D box trajectories
  • CNN for 2D appearance from 2D image patches

3D Object Detector Feature Extractor

Optimizer Evaluation

Sensor Data

slide-29
SLIDE 29

Improve Feature Learning for 3D MOT

29

  • X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020.
  • How do we do?
  • (c) Learn discriminative object features through interaction with GNNs

Graph Neural Networks

3D Object Detector Feature Extractor

Optimizer Evaluation

Sensor Data

slide-30
SLIDE 30

Improve Feature Learning for 3D MOT

30

  • X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020
  • Is encoding the multi-modal features really useful?
  • Answer: Yes
  • We should encode different features so that they can compliment

each other

Use feature from single modality

A: appearance feature, M: motion feature

Use feature from multiple modalities: Performance increased!

3D Object Detector Feature Extractor

Optimizer Evaluation

Sensor Data

slide-31
SLIDE 31

Improve Feature Learning for 3D MOT

31

  • X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020
  • Is feature interaction using GNN useful to 3D MOT?
  • Answer: Yes
  • We should let objects communicate and encode the context

information

Performance largely increased with GNN layers = 3 v.s. 0 !

3D Object Detector Feature Extractor

Optimizer Evaluation

Sensor Data

slide-32
SLIDE 32

Qualitative Results

32

slide-33
SLIDE 33

Moving Forward

End-to-End Perception and Prediction Pipeline

33

slide-34
SLIDE 34

End-to-End Perception and Prediction Pipeline

  • We now have only data association

jointly optimized

  • What is next? Can we go further?
  • End-to-end MOT and detection?
  • End-to-end MOT and trajectory forecasting?
  • End-to-end MOT and both detection, forecasting?

34

3D Object Detector Feature Extractor 3D detections

Pairwise affinity matrix

Optimizer

3D Object Trajectories Sensor data

Jointly

  • ptimized

Trajectory Forecasting

Past object trajectories

slide-35
SLIDE 35

Joint 3D MOT and Trajectory Forecasting

  • Prior work separates 3D MOT and trajectory forecasting
  • Why is it not good to separate the two?
  • Optimization of entire pipeline is impossible, leading to sub-optimal performance
  • Slow inference due to separate modular design, each network takes time
  • What can we do?

35

  • X. Weng*, Y. Ye*, K. Kitani. Joint 3D Tracking and Forecasting with Graph Neural Network and Diversity Sampling. arXiv 2020

Pipeline from Prior Work

Detected

  • bjects in

current frame

Objects trajectories in past H frames

Last frame Current frame

Trajectory forecasting head Predicted trajectories in future T frames Objects trajectories up to current frame

3D Multi-Object Tracking Trajectory Forecasting

Feature extraction Feature extraction 3D MOT head Feature extraction

Separate

3D Object Detector Feature Extractor

Optimizer

Trajectory Forecasting

Sensor Data

slide-36
SLIDE 36

Joint 3D MOT and Trajectory Forecasting

  • Parallelize the MOT and forecasting
  • Share the feature learning process
  • Use GNN3DMOT as part of our network for tracking
  • Add a multi-modal trajectory forecasting head

36

  • X. Weng*, Y. Ye*, K. Kitani. Joint 3D Tracking and Forecasting with Graph Neural Network and Diversity Sampling. arXiv 2020

Edge features

Diversity sampling

Node features

GNN for feature interaction

Predicted trajectories in future T frames Detected objects in current frame Objects trajectories in past H frames

Last frame

Current frame Feature extraction Feature extraction

3D MOT head Trajectory forecasting head

Joint 3D Tracking and Forecasting

GNN3DMOT Forecasting Shared Feature Learning

3D Object Detector Feature Extractor

Optimizer

Trajectory Forecasting

Sensor Data

slide-37
SLIDE 37

Joint 3D MOT and Trajectory Forecasting

  • Is it useful to do joint optimization?
  • Add forecasting is useful to tracking
  • How does adding forecasting affect 3D MOT?
  • Add joint optimization with forecasting improves performance on tracking

37

  • X. Weng*, Y. Ye*, K. Kitani. Joint 3D Tracking and Forecasting with Graph Neural Network and Diversity Sampling. arXiv 2020

Improvement on 5 out of 6 entries!

3D MOT evaluation without forecasting module

3D Object Detector Feature Extractor

Optimizer

Trajectory Forecasting

Sensor Data

slide-38
SLIDE 38

Joint 3D MOT and Trajectory Forecasting

  • Is it useful to do joint optimization?
  • Add forecasting is useful to tracking
  • Add MOT is useful to forecasting
  • How does adding 3D MOT affect trajectory forecasting?
  • Add joint optimization with 3D MOT improves performance on forecasting

38

  • X. Weng*, Y. Ye*, K. Kitani. Joint 3D Tracking and Forecasting with Graph Neural Network and Diversity Sampling. arXiv 2020

Forecasting evaluation without 3D MOT Performance improved after adding MOT!

3D Object Detector Feature Extractor

Optimizer

Trajectory Forecasting

Sensor Data

slide-39
SLIDE 39

Joint MOT and Object Detection

  • Now we have method for joint MOT and forecasting
  • Can we do joint detection and MOT?
  • Use GNN3DMOT as part of our network for tracking
  • Add a detection head to classify/regress objects
  • Can be possibly extended to BEV and 3D detection and MOT

39

  • Y. Wang, X. Weng, K. Kitani. Joint Detection and Multi-Object Tracking with Graph Neural Networks and Complete Feature Learning. arXiv 2020

GNN3DMOT Detection

Edge features Node features

GNN for feature interaction

Trajectories up to the current frames Anchors in current frame Objects trajectories in past H frames

Last frame

Current frame Feature extraction Feature extraction

3D MOT head Object detection head

Joint 3D Tracking and Forecasting

3D Object Detector Feature Extractor

Optimizer

Trajectory Forecasting

Sensor Data

slide-40
SLIDE 40

Moving Forward

Achieve trajectory forecasting as tracking

40

slide-41
SLIDE 41

Conventional Perception and Prediction Pipeline

  • Traditional pipeline:
  • Detection -> data association -> trajectory forecasting
  • Is this pipeline the best?
  • What are other options?

Weng et al. Unsupervised Sequence Forecasting of 100,000 Points for Unsupervised Trajectory Forecasting. arXiv 2020

3D Object Detection Trajectory Forecasting

Sensor Data Feature Extraction

Optimization

slide-42
SLIDE 42

Trajectory Forecasting as Tracking

  • Traditional pipeline:
  • Detection -> MOT -> trajectory forecasting
  • Our new pipeline
  • Sensor data forecasting -> detection -> MOT

42

Weng et al. Unsupervised Sequence Forecasting of 100,000 Points for Unsupervised Trajectory Forecasting. arXiv 2020

Switch the order

slide-43
SLIDE 43

Take Home Message

  • Important to develop appropriate evaluation metrics for 3D MOT to measure progress
  • Representation of objects in 3D MOT should take into account other objects

43

3D Object Detection Evaluation

Sensor Data Feature Extraction

Optimization

Many large-scale datasets but sensor suite and annotations are not unified 3D detection performance is improving but doesn't take into account sensor physics Doesn't take into account context of multi-level optimization problem (sensors, forecasting, control) Representation doesn't take into account context of other objects, scene and past Need better 3D MOT evaluation Detection and tracking should be coupled more tightly

Open Problems

Dynamics models should be customized to object type

slide-44
SLIDE 44

Summary of Works

  • X. Weng, J. Wang, D. Held, K. Kitani. 3D Multi-Object Tracking: A Baseline and New

Evaluation Metrics. IROS 2020

  • X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-

Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020

  • X. Weng*, Y. Ye*, K. Kitani. Joint 3D Tracking and Forecasting with Graph Neural

Network and Diversity Sampling. arXiv 2020

  • Y. Wang, X. Weng, K. Kitani. Joint Detection and Multi-Object Tracking with Graph

Neural Networks and Complete Feature Learning. arXiv 2020

  • X. Weng, J. Wang, S. Levine, K. Kitani, N. Rhinehart. Unsupervised Sequence Forecasting
  • f 100,000 Points for Unsupervised Trajectory Forecasting. arXiv 2020
  • For other works of mine, http://www.xinshuoweng.com/

44

slide-45
SLIDE 45

3D Multi-Object Tracking for Autonomous Driving

Xinshuo Weng Robotics Institute, Carnegie Mellon University RI PhD Speaking Qualifier

September 24, 2020 Committee Member: Kris Kitani (advisor), Martial Hebert, David Held, Peiyun Hu

45