Autonomous Driving Xiaozhi Chen Tsinghua University Joint work - PowerPoint PPT Presentation

3D Object Detection for Autonomous Driving Xiaozhi Chen Tsinghua University Joint work with Kaustav Kunku, Yukun Zhu, Ziyu Zhang, Andrew Berneshawi, Huimin Ma, Sanja Fidler and Raquel Urtasun

Goal: 3D Object Detection Input Image Where are the cars in the image?

Goal: 3D Object Detection Input Image Where are the cars in the image? How far are the cars from the driver?

Goal: 3D Object Detection  2D boxes  3D poses  3D location  3D boxes

Related Work: 3D Pose Estimation 3D 2 PM, Pepik et al. CVPR’12 Fidler et al. NIPS’12 ALM, Xiang et al. CVPR’12 ObjectNet3D PASCAL3D+ Xiang et al. WACV’14 Xiang et al. ECCV’16 • • Thomas et al. CVPR’06 Glasner et al. ICCV’11 • • Hoiem et al. CVPR’07 Hejrati et al. NIPS’12 • • Yan et al. ICCV’07 Etc.

Related Work: 3D Object Localization Xiang et al. CVPR’15, arXiv’16 Zia et al. CVPR’14, IJCV’15 Chhaya et al. ICRA’16

Related Work: 3D Object Detection (Indoor) (Deep) Sliding Shape Song & Xiao. ECCV’14, CVPR’16 Depth R-CNN Gupta et al. ECCV’14, CVPR’15

What’s the Best Sensor for Self -driving Cars? LIDAR e.g., Google, Baidu Camera e.g., Mobileye, Tesla

Outline Stereo LIDAR Monocular

Outline Stereo LIDAR Monocular 1 3D Object Detection using Stereo Images NIPS’15 2 Monocular 3D Object Detection CVPR’16

3D Object Detection using Stereo Images • Xiaozhi Chen*, Kaustav Kunku*, Yukun Zhu, Andrew Berneshawi, Huimin Ma, Sanja Fidler, Raquel Urtasun . 3D Object Proposals for Accurate Object Class Detection . NIPS 2015.

Typical Object Detection Pipeline  Candidate Box Selection  Sliding Window  Exhaustive search across the entire image at multiple scales  Object Proposal  Reduces the search space to focus on few regions, requires high recall  Feature Extraction  HOG, CNN, etc.  Classification  Linear classifiers

Typical Object Detection Pipeline R- CNN [CVPR’14 ] Fast R- CNN [ICCV’15] Faster R- CNN [NIPS’15 ]

3DOP: Overview 3D Proposal Generation Stereo images 3D proposals CNN Scoring

KITTI: Autonomous Driving Dataset  KITTI (Geiger et al., CVPR’12)  Categories : Car, Pedestrian, Cyclist  Data: LIDAR point cloud, stereo images  Annotations : 2D/3D bounding boxes, occlusion/truncation labels

2D Proposals Recall on KITTI 2D methods: BING SS EB MCG 3D methods: MCG-D Car Cyclist Pedestrian • PASCAL : recall (1K Prop.) > 95% • KITTI : recall (1K Prop.) < 75%!!! • [BING] BING: Binarized normed gradients for objectness estimation at 300fps. CVPR’14 . Cheng et al. • [SS] Segmentation as selective search for object recognition. ICCV’11 . Sande et al. • [EB] Edge boxes: locating object proposals from edges . ECCV’14 . Zitnick et al. • [MCG] Multiscale combinatorial grouping. CVPR’14 . Pablo et al. • [MCG-D] Learning rich features from RGB-D images for object detection and segmentation. ECCV’14. Gupta et al.

2D Proposals Recall on KITTI 2D methods: BING SS EB MCG 3D methods: MCG-D Car Cyclist Pedestrian • PASCAL : recall (1K Prop.) > 95% Why? • KITTI : recall (1K Prop.) < 75%!!! • [BING] BING: Binarized normed gradients for objectness estimation at 300fps. CVPR’14 . Cheng et al. • [SS] Segmentation as selective search for object recognition. ICCV’11 . Sande et al. • [EB] Edge boxes: locating object proposals from edges . ECCV’14 . Zitnick et al. • [MCG] Multiscale combinatorial grouping. CVPR’14 . Pablo et al. • [MCG-D] Learning rich features from RGB-D images for object detection and segmentation. ECCV’14. Gupta et al.

Challenges on KITTI  Strict localization metric  0.7 IoU overlap threshold for Cars  Clutter scene  Heavy occlusion  Small objects, high resolution (370x1240) Easy Moderate Hard

3DOP: Feature Computation Left image Right image Bird’s eye view Height prior Yellow: Occupancy Green: Ground plane Red  Blue: Increasing height prior Purple: Free space

Parameterization • 𝐲 : Point cloud of input stereo image pair

Parameterization • 𝐲 : Point cloud of input stereo image pair • 𝐳 = (𝑦, 𝑧, 𝑨, 𝜄, 𝑑, 𝑢) : 3D bounding box candidate

Parameterization • 𝐲 : Point cloud of input stereo image pair • 𝐳 = (𝑦, 𝑧, 𝑨, 𝜄, 𝑑, 𝑢) : 3D bounding box candidate (𝑦, 𝑧, 𝑨) : center of 3D box

Parameterization • 𝐲 : Point cloud of input stereo image pair • 𝐳 = (𝑦, 𝑧, 𝑨, 𝜄, 𝑑, 𝑢) : 3D bounding box candidate (𝑦, 𝑧, 𝑨) : center of 3D box 𝜄 : azimuth angle

Parameterization • 𝐲 : Point cloud of input stereo image pair • 𝐳 = (𝑦, 𝑧, 𝑨, 𝜄, 𝑑, 𝑢) : 3D bounding box candidate (𝑦, 𝑧, 𝑨) : center of 3D box 𝜄 : azimuth angle 𝑑 : object category ∈ {Car, Pedestrian, Cyclist}

Parameterization • 𝐲 : Point cloud of input stereo image pair • 𝐳 = (𝑦, 𝑧, 𝑨, 𝜄, 𝑑, 𝑢) : 3D bounding box candidate (𝑦, 𝑧, 𝑨) : center of 3D box 𝜄 : azimuth angle 𝑑 : object category ∈ {Car, Pedestrian, Cyclist} 𝑢 ∈ {1, … , 𝑈 𝑑 } : category-specific template

Parameterization • 𝐲 : Point cloud of input stereo image pair • 𝐳 = (𝑦, 𝑧, 𝑨, 𝜄, 𝑑, 𝑢) : 3D bounding box candidate (𝑦, 𝑧, 𝑨) : center of 3D box 𝜄 : azimuth angle 𝑑 : object category ∈ {Car, Pedestrian, Cyclist} 𝑢 ∈ {1, … , 𝑈 𝑑 } : category-specific template 𝐹 𝐲, 𝐳 = 𝐹 𝑞𝑑 𝐲, 𝐳 + 𝐹 𝑔𝑡 𝐲, 𝐳 + 𝐹 ℎ𝑢 𝐲, 𝐳 + 𝐹 ℎ𝑢−𝑑𝑝𝑜𝑢𝑠 𝐲, 𝐳

Energy Terms • 𝐲 : Point cloud of input stereo image pair • 𝐳 = (𝑦, 𝑧, 𝑨, 𝜄, 𝑑, 𝑢) : 3D bounding box candidate 𝐹 𝐲, 𝐳 = 𝐹 𝑞𝑑 𝐲, 𝐳 + 𝐹 𝑔𝑡 𝐲, 𝐳 + 𝐹 ℎ𝑢 𝐲, 𝐳 + 𝐹 ℎ𝑢−𝑑𝑝𝑜𝑢𝑠 𝐲, 𝐳 Point cloud occupancy

Energy Terms • 𝐲 : Point cloud of input stereo image pair • 𝐳 = (𝑦, 𝑧, 𝑨, 𝜄, 𝑑, 𝑢) : 3D bounding box candidate 𝐹 𝐲, 𝐳 = 𝐹 𝑞𝑑 𝐲, 𝐳 + 𝐹 𝑔𝑡 𝐲, 𝐳 + 𝐹 ℎ𝑢 𝐲, 𝐳 + 𝐹 ℎ𝑢−𝑑𝑝𝑜𝑢𝑠 𝐲, 𝐳 Free space

Energy Terms • 𝐲 : Point cloud of input stereo image pair • 𝐳 = (𝑦, 𝑧, 𝑨, 𝜄, 𝑑, 𝑢) : 3D bounding box candidate 𝐹 𝐲, 𝐳 = 𝐹 𝑞𝑑 𝐲, 𝐳 + 𝐹 𝑔𝑡 𝐲, 𝐳 + 𝐹 ℎ𝑢 𝐲, 𝐳 + 𝐹 ℎ𝑢−𝑑𝑝𝑜𝑢𝑠 𝐲, 𝐳 Height prior

Energy Terms • 𝐲 : Point cloud of input stereo image pair • 𝐳 = (𝑦, 𝑧, 𝑨, 𝜄, 𝑑, 𝑢) : 3D bounding box candidate 𝐹 𝐲, 𝐳 = 𝐹 𝑞𝑑 𝐲, 𝐳 + 𝐹 𝑔𝑡 𝐲, 𝐳 + 𝐹 ℎ𝑢 𝐲, 𝐳 + 𝐹 ℎ𝑢−𝑑𝑝𝑜𝑢𝑠 𝐲, 𝐳 Height contrast

Inference 𝐳 ∗ = argmin 𝐹 𝑞𝑑 𝐲, 𝐳 + 𝐹 𝑔𝑡 𝐲, 𝐳 + 𝐹 ℎ𝑢 𝐲, 𝐳 + 𝐹 ℎ𝑢−𝑑𝑝𝑜𝑢𝑠 𝐲, 𝐳 𝐳  Voxelization • Voxel Dim. = 0.2m  Candidate sampling • Sample cuboids closed the road plane  Feature computation • 3D integral images  Proposals ranking • Sort all candidates according to 𝐹 𝐲, 𝐳 , NMS Inference time: ~1.2s in a single thread

Inference Speed Comparison Method Time (sec.) BING [CVPR’14] 0.01 Selective Search [ICCV’11] 15 EdgeBoxes [ECCV’14] 1.5 MCG [CVPR’14] 100 MCG-D [ECCV’14] 160 Ours 1.2

Learning Structured SVM: = 1 − 3D IoU

3D Object Detection Network Box proposal ROI FCs FC Softmax pooling Concatenation classification Conv layers FC Box regression ROI FCs pooling FC Orientation regression Context region

3D Object Detection Network • Incorporating context information • Joint object detection and orientation estimation Box proposal ROI FCs FC Softmax pooling Concatenation Conv classification layers FC Box regression ROI FCs pooling FC Orientation regression Context region

3D Object Detection Network Box proposal ROI FCs FC Softmax pooling Concatenation classification Conv layers FC Box regression ROI FCs pooling FC Orientation regression Context region • Regression targets: 𝐮 2D = (𝑢 𝑦 , 𝑢 𝑧 , 𝑢 𝑥 , 𝑢 ℎ ) 𝐮 3D = (𝑢 𝑌 , 𝑢 𝑍 , 𝑢 𝑎 , 𝑢 𝑀 , 𝑢 𝑋 , 𝑢 𝐼 ) 𝐮 ort = 𝑢 𝜄 • Multi-task loss: 𝑀 = 𝑀 classification + 𝑀 box + 𝑀 orientation Softmax loss Smooth 𝑚 1 loss

Autonomous Driving Xiaozhi Chen Tsinghua University Joint work - PowerPoint PPT Presentation

3D Object Detection for Autonomous Driving Xiaozhi Chen Tsinghua University Joint work with Kaustav Kunku, Yukun Zhu, Ziyu Zhang, Andrew Berneshawi, Huimin Ma, Sanja Fidler and Raquel Urtasun Goal: 3D Object Detection Input Image Where are

AUTONOMOUS DRIVING FRENCH NATIONAL PLAN NEW FRANCE FOR INDUSTRY (NFI) PLAN JF SENCERIN

AUTONOMOUS DRIVING IN AGRICULTURE LEADING TO AUTONOMOUS WORKSITE SOLUTIONS Dr. John F. Reid,

DEEP LEARNING IN THE FIELD OF AUTONOMOUS DRIVING AN OUTLINE OF THE DEPLOYMENT PROCESS FOR ADAS

T echnical and Legal Challenges for Urban Autonomous Driving Seung-Woo Seo, Prof. Vehicle

Visual Scene Understanding for Autonomous Driving Raquel Urtasun University of Toronto Oct 3,

Distracted Driving Jennifer Smith What is Distracted Driving? Driving while engaged in any

Self-Driving Cars As Edge Computing Devices Matt Ranney - @mranney Uber ATG Why Self-Driving?

Autonomous Driving: The Good The Bad and The Ugly When do you launch the product? Whos

S9932: LEARNING TO BOOST S9932: LEARNING TO BOOST ROBUSTNESS FOR ROBUSTNESS FOR AUTONOMOUS

Overview of Autonomous Driving Sept 26, 2017 Sahil Narang University of North Carolina, Chapel

Self Driving Car Self Driving Cars Auto Breaking Fully Lane Guidance Autonomous Auto Parking

Safe Driving Techniques Road Safety Management Use of mobile phones Safe Driving Policy

DRIVING AI 1 Driving AI AI world representation Path finding AI driving

LMIs and autonomous work 1 From autonomous work to discontinuous career paths Autonomous

Math 211 Math 211 Lecture #2 2 Autonomous Equations Autonomous Equations General equation:

Autonomous driving : French policy update F-US roundtable Connected and autonomous vehicles : a

CS381V Paper Presentation Chun-Chen Kuo Selective Search for Object Recognition Outline

Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Snapchat Research

OWL, Patterns, & FOL COMP62342 Sean Bechhofer sean.bechhofer@manchester.ac.uk Uli

Traffic Incident Management (TIM) Program TIM Program operational goals Relationship to TSMO

Category-level localization Cordelia Schmid Recognition Classification Object

Object detection Subhransu Maji CMPSCI 670: Computer Vision November 29, 2016 Administrivia

The Need for Distributed Intelligence Automation Implemented through Four Overlapping Approaches !

Efficient Deep Vision for Aerial Visual Understanding Dr Christos Kyrkou KIOS Research and

Autonomous Driving Xiaozhi Chen Tsinghua University Joint work - PowerPoint PPT Presentation

3D Object Detection for Autonomous Driving Xiaozhi Chen Tsinghua University Joint work with Kaustav Kunku, Yukun Zhu, Ziyu Zhang, Andrew Berneshawi, Huimin Ma, Sanja Fidler and Raquel Urtasun Goal: 3D Object Detection Input Image Where are

AUTONOMOUS DRIVING FRENCH NATIONAL PLAN NEW FRANCE FOR INDUSTRY (NFI) PLAN JF SENCERIN

AUTONOMOUS DRIVING IN AGRICULTURE LEADING TO AUTONOMOUS WORKSITE SOLUTIONS Dr. John F. Reid,

DEEP LEARNING IN THE FIELD OF AUTONOMOUS DRIVING AN OUTLINE OF THE DEPLOYMENT PROCESS FOR ADAS

T echnical and Legal Challenges for Urban Autonomous Driving Seung-Woo Seo, Prof. Vehicle

Visual Scene Understanding for Autonomous Driving Raquel Urtasun University of Toronto Oct 3,

Distracted Driving Jennifer Smith What is Distracted Driving? Driving while engaged in any

Self-Driving Cars As Edge Computing Devices Matt Ranney - @mranney Uber ATG Why Self-Driving?

Autonomous Driving: The Good The Bad and The Ugly When do you launch the product? Whos

S9932: LEARNING TO BOOST S9932: LEARNING TO BOOST ROBUSTNESS FOR ROBUSTNESS FOR AUTONOMOUS

Overview of Autonomous Driving Sept 26, 2017 Sahil Narang University of North Carolina, Chapel

Self Driving Car Self Driving Cars Auto Breaking Fully Lane Guidance Autonomous Auto Parking

Safe Driving Techniques Road Safety Management Use of mobile phones Safe Driving Policy

DRIVING AI 1 Driving AI AI world representation Path finding AI driving

LMIs and autonomous work 1 From autonomous work to discontinuous career paths Autonomous

Math 211 Math 211 Lecture #2 2 Autonomous Equations Autonomous Equations General equation:

Autonomous driving : French policy update F-US roundtable Connected and autonomous vehicles : a

CS381V Paper Presentation Chun-Chen Kuo Selective Search for Object Recognition Outline

Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Snapchat Research

OWL, Patterns, &amp; FOL COMP62342 Sean Bechhofer sean.bechhofer@manchester.ac.uk Uli

Traffic Incident Management (TIM) Program TIM Program operational goals Relationship to TSMO

Category-level localization Cordelia Schmid Recognition Classification Object

Object detection Subhransu Maji CMPSCI 670: Computer Vision November 29, 2016 Administrivia

The Need for Distributed Intelligence Automation Implemented through Four Overlapping Approaches !

Efficient Deep Vision for Aerial Visual Understanding Dr Christos Kyrkou KIOS Research and

OWL, Patterns, & FOL COMP62342 Sean Bechhofer sean.bechhofer@manchester.ac.uk Uli