New Perspective on Perception and Prediction Pipeline for Autonomous - - PowerPoint PPT Presentation

new perspective on perception and
SMART_READER_LITE
LIVE PREVIEW

New Perspective on Perception and Prediction Pipeline for Autonomous - - PowerPoint PPT Presentation

New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani Robotics Institute, Carnegie Mellon University August 28, 2020 1 Perception and prediction are important components in the autonomous


slide-1
SLIDE 1

New Perspective on Perception and Prediction Pipeline for Autonomous Driving

Xinshuo Weng, Kris Kitani Robotics Institute, Carnegie Mellon University

August 28, 2020

1

slide-2
SLIDE 2

3

Perception and prediction are important components in the autonomous driving stack

slide-3
SLIDE 3

Standard Perception and Prediction Pipeline

4

3D Object Detection 3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

Perception Prediction

slide-4
SLIDE 4

5

LiDAR RGB

3D Object Detection 3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

Standard Perception and Prediction Pipeline

slide-5
SLIDE 5

6

3D Object Detection

Sensor Data

Detection results

3D Multi-Object Tracking

Trajectory Forecasting

Standard Perception and Prediction Pipeline

slide-6
SLIDE 6

7

3D Object Detection

Sensor Data

Tracking results

3D Multi-Object Tracking

Trajectory Forecasting

Standard Perception and Prediction Pipeline

slide-7
SLIDE 7

8

3D Object Detection

Sensor Data

3D Multi-Object Tracking

Trajectory Forecasting

Forecasting results

Standard Perception and Prediction Pipeline

slide-8
SLIDE 8

9

Standard Perception and Prediction Pipeline

3D Object Detection 3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

Is this really the best place to perform prediction?

slide-9
SLIDE 9

10

Standard Perception and Prediction Pipeline

3D Object Detection 3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

Can we do prediction here?

slide-10
SLIDE 10

11

Standard Perception and Prediction Pipeline

3D Object Detection 3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

Can we do prediction here?

slide-11
SLIDE 11

12

What is the state of the art for trajectory forecasting?

  • 1. Datasets: Bigger and multi-modal
slide-12
SLIDE 12

State of the Art: Datasets

13

Better models from bigger datasets!

* Mined trajectory data not counted for the Argo dataset

*

150x increase! 3D Object Detection 3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

(Waymo)

Sun et al. Scalability in Perception for Autonomous Driving: Waymo Open Dataset. CVPR 2020

slide-13
SLIDE 13

State of the Art: Datasets

14

3D Object Detection 3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

  • J. Liang, L. Jiang, K. Murphy, T. Yu, A. Hauptmann.

The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction. CVPR 2020

Dataset with multi-modal ground truth

Green: multi-modal ground truth future Yellow: past observations Each modality of the future is generated by setting a different goal in the simulator

In contrast to prior dataset with single future ground truth and make multi-future evaluation possible What are the right metrics for evaluation?

slide-14
SLIDE 14

15

What is the state of the art for trajectory forecasting?

  • 2. Model: more side information
slide-15
SLIDE 15

State of the Art: Trajectory Forecasting Models

16

3D Object Detection 3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

Multi-agent interaction modeling with Graph Neural Networks (GNNs)

  • J. Sun, Q. Jiang and C. Lu. Recursive Social Behavior Graph for Trajectory Prediction. CVPR 2020

Contextual features are encoded to take into account

  • f nearby agents’ motion during prediction
slide-16
SLIDE 16

State of the Art: Trajectory Forecasting Models

17

3D Object Detection 3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

  • T. Phan-Minh, E. Grigore, F. Boulton. O. Beijbom, E. Wolff.

CoverNet: Multimodal Behavior Prediction using Trajectory Sets. CVPR 2020

Road context / physical constraint helps

Using road structure semantics as inputs eliminates physically impossible trajectories

slide-17
SLIDE 17

State of the Art: Trajectory Forecasting Models

18

3D Object Detection 3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

  • N. Rhinehart, R. Mcallister, K. Kitani and S. Levine.

PRECOG: PREdiction Conditioned On Goals in Visual Multi-Agent Settings. ICCV 2019

Goal-conditioned forecasting

Different goals could lead to different forecasts

slide-18
SLIDE 18

State of the Art: Trajectory Forecasting Models

19

Jointly optimized 3D Object Detection 3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

End-to-end perception and prediction pipeline

  • M. Liang, B. Yang, W. Zeng, Y. Chen, R. Hu, S. Casas, R. Urtasun.

PnPNet: End-to-End Perception and Prediction with Tracking in the Loop. CVPR 2020

Gradients Gradients

Separately optimized

  • 1. Suboptimal performance
  • 2. Slow inference speed

All modules are optimized for the end goal: trajectory prediction

slide-19
SLIDE 19

State of the Art

20

Lots of progress on (1) building better/larger datasets and (2) improving forecasting models The pipeline stays the same! Any possible improvement at the pipeline level?

3D Object Detection 3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

slide-20
SLIDE 20

21

Our recent work on new perception and prediction pipeline

1. Parallelized tracking and forecasting 2. SPF2: Sequential Pose forecasting by Sequential Pointcloud Forecasting

slide-21
SLIDE 21

Limitation of the Standard Pipeline

  • Pipeline in a sequential order
  • Downstream module takes the outputs of its

upstream module as inputs

  • Any limitation?
  • Errors from the upstream module cannot be

corrected and will degrade performance of the downstream module

  • Can we go beyond the sequential pipeline?

22

3D Object Detection 3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

GT past trajectories Predicted trajectories Tracking results Predicted trajectories

Data association error in tracking

slide-22
SLIDE 22

Parallelized Tracking and Forecasting

23

  • X. Weng, Y. Ye, K. Kitani. Joint 3D Tracking and Forecasting with Graph Neural Network and Diversity Sampling. arXiv 2020

3D Object Detection

3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data Sequential Pipeline Parallelized Tracking and Forecasting Pipeline

Feature Extraction Matching Feature Extraction Trajectory Decoder

3D Object Detection 3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

Shared Feature Learning

Matching

Trajectory Decoder

Similar components, which aims to encode object features from past information Module-specific components

slide-23
SLIDE 23

Parallelized Tracking and Forecasting

  • Advantages
  • Forecasting does not explicitly depend on the tracking results but implicitly use

the association information in the current frame

  • Improve computational efficiency by sharing the feature learning process
  • Overview

24

Forecasting

Predicted trajectories in future T frames Edge features

Diversity sampling

Node features

GNN for feature interaction

Detected objects in current frame Objects trajectories in past H frames

Last frame

Current frame Feature extraction Feature extraction

3D MOT head Trajectory forecasting head

Joint 3D Tracking and Forecasting

Shared Feature Learning 3D MOT

  • X. Weng, Y. Ye, K. Kitani. Joint 3D Tracking and Forecasting with Graph Neural Network and Diversity Sampling. arXiv 2020

3D Object Detection 3D Multi-Object Tracking Trajectory Forecasting Sensor Data Shared Feature Learning

slide-24
SLIDE 24

Parallelized Tracking and Forecasting

  • Shared feature learning
  • Use LSTM/MLP to learn motion features from objects’ box trajectories
  • Encode contextual / relative features from nearby objects by modeling

interaction with GNNs

25

  • X. Weng, Y. Ye, K. Kitani. Joint 3D Tracking and Forecasting with Graph Neural Network and Diversity Sampling. arXiv 2020

3D Object Detection 3D Multi-Object Tracking Trajectory Forecasting Sensor Data Shared Feature Learning

slide-25
SLIDE 25

Parallelized Tracking and Forecasting

  • 3D multi-object tracking
  • MLP takes edge features as inputs to regress the similarity scores

between every pair of objects

  • During training, estimated affinity matrix is supervised with GT
  • During testing, estimated affinity matrix is fed to Hungarian algorithm

26

  • X. Weng, Y. Ye, K. Kitani. Joint 3D Tracking and Forecasting with Graph Neural Network and Diversity Sampling. arXiv 2020

3D Object Detection 3D Multi-Object Tracking Trajectory Forecasting Sensor Data Shared Feature Learning

slide-26
SLIDE 26

Parallelized Tracking and Forecasting

  • Trajectory forecasting
  • A diversity sampling function that maps each object feature to a set of

latent code covering various modes

  • A conditional VAE is used to predict future trajectories from diverse

latent codes

27

  • X. Weng, Y. Ye, K. Kitani. Joint 3D Tracking and Forecasting with Graph Neural Network and Diversity Sampling. arXiv 2020

3D Object Detection 3D Multi-Object Tracking Trajectory Forecasting Sensor Data Shared Feature Learning

slide-27
SLIDE 27

28

Quantitative Results

slide-28
SLIDE 28
  • Is the parallel pipeline effective? Can two modules benefit one another?
  • How does adding 3D MOT affect performance of forecasting?
  • Add 3D MOT branch improves performance on forecasting

29

Forecasting evaluation without 3D MOT Performance improved after adding MOT!

Parallelized Tracking and Forecasting

3D Object Detection 3D Multi-Object Tracking Trajectory Forecasting Sensor Data Shared Feature Learning

  • X. Weng, Y. Ye, K. Kitani. Joint 3D Tracking and Forecasting with Graph Neural Network and Diversity Sampling. arXiv 2020
slide-29
SLIDE 29
  • Is the parallel pipeline effective? Can two modules benefit one another?
  • Add MOT is useful to forecasting
  • How does adding forecasting affect performance of 3D MOT?
  • Add forecasting branch improves performance on tracking

30

Parallelized Tracking and Forecasting

3D Object Detection 3D Multi-Object Tracking Trajectory Forecasting Sensor Data Shared Feature Learning

  • X. Weng, Y. Ye, K. Kitani. Joint 3D Tracking and Forecasting with Graph Neural Network and Diversity Sampling. arXiv 2020

Improvement on 5 out of 6 entries!

3D MOT evaluation without forecasting module

slide-30
SLIDE 30
  • Is the new parallel pipeline effective?
  • Yes. Two modules in the pipeline implicitly benefit each other!
  • For more details in this work
  • Scan the QR code for the paper

31

Parallelized Tracking and Forecasting

3D Object Detection 3D Multi-Object Tracking Trajectory Forecasting Sensor Data Shared Feature Learning

  • X. Weng, Y. Ye, K. Kitani. Joint 3D Tracking and Forecasting with Graph Neural Network and Diversity Sampling. arXiv 2020
slide-31
SLIDE 31

32

Our recent work on new perception and prediction pipeline

1. Parallelized tracking and forecasting 2. SPF2: Sequential Pose forecasting by Sequential Pointcloud Forecasting

slide-32
SLIDE 32
  • Detection -> MOT -> Trajectory Forecasting
  • Any limitation?
  • Requires labeling at two levels in 3D space
  • Requires instance-level object labels to train (a)
  • Requires sequence-level object labels to train (b)(c)
  • Expensive to obtain in 3D space

33

Limitation of the Standard Pipeline

3D Object Detection 3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

slide-33
SLIDE 33

34

Our Contributions

  • 1. A novel pipeline that inverts the order of forecasting and

reduces labeling requirement

  • 2. A new task, Sequential Pointcloud Forecasting (SPF)
slide-34
SLIDE 34

SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting

  • Standard pipeline:
  • Detection -> MOT -> Trajectory Forecasting
  • Our new pipeline
  • Sequential Pointcloud Forecasting -> Detection -> MOT
  • Differences
  • Invert the order of forecasting
  • Forecast at the sensor level, instead of at the object level

35

Switch the order

Weng et al. Unsupervised Sequence Forecasting of 100,000 Points for Unsupervised Trajectory Forecasting. arXiv 2020

slide-35
SLIDE 35
  • Any advantage of our pipeline?
  • The forecasting module does not require human annotation
  • If use S.O.T.A. filter-based MOT methods, labeling requirement is reduced to instance-level
  • Sequence-level object labels are not required anymore
  • Not practically feasible in standard pipeline
  • Easy to incorporate scene context information during forecasting
  • Why not use unsupervised detection also?
  • Accuracy and labeling requirement trade-off

36

SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting

Weng et al. Unsupervised Sequence Forecasting of 100,000 Points for Unsupervised Trajectory Forecasting. arXiv 2020

slide-36
SLIDE 36

SPF: Sequential Pointcloud Forecasting

  • Advantages:
  • Remove the need of labels for training the forecasting module
  • Prediction represents the entire scene, including information in the background
  • Easier to incorporate scene context information during forecasting

37

Weng et al. Unsupervised Sequence Forecasting of 100,000 Points for Unsupervised Trajectory Forecasting. arXiv 2020

slide-37
SLIDE 37

SPFNet

  • Four modules
  • (a) Shared point cloud encoder

(b) LSTM for temporal modeling

  • (c) Shared point cloud decoder

(d) Losses

38

Weng et al. Unsupervised Sequence Forecasting of 100,000 Points for Unsupervised Trajectory Forecasting. arXiv 2020

slide-38
SLIDE 38

39

Quantitative Results

slide-39
SLIDE 39

Evaluation of the SPFNet on KITTI and nuScenes

40

  • Is our SPFNet effective to the proposed SPF task?
  • Outperform baselines that we have devised using existing techniques

Weng et al. Unsupervised Sequence Forecasting of 100,000 Points for Unsupervised Trajectory Forecasting. arXiv 2020

slide-40
SLIDE 40

Evaluation of the SPF2 Pipeline on KITTI and nuScenes

41

  • Is our new SPF2 pipeline competitive?
  • Yes, also requires less labels for training

Weng et al. Unsupervised Sequence Forecasting of 100,000 Points for Unsupervised Trajectory Forecasting. arXiv 2020

slide-41
SLIDE 41

Take Home Message

  • Improvement is not only possible to happen at the modular level but also

at the pipeline level

42

3D Object Detection Trajectory Forecasting Sensor Data 3D Multi-Object Tracking

Many large-scale datasets but sensor suite and annotations are not unified Trajectory forecasting is improving but should be coupled with perception modules more tightly Doesn't take into account multi-level optimization problem (planning, control) Should also take into account sensor optimization and redundancy

Open Problems

slide-42
SLIDE 42

New Perspective on Perception and Prediction Pipeline for Autonomous Driving

Xinshuo Weng, Kris Kitani Robotics Institute, Carnegie Mellon University

August 28, 2020

43