Hands-on Intro To Developing Vision and LiDAR Classifier Formula - - PowerPoint PPT Presentation

hands on intro to developing vision and lidar classifier
SMART_READER_LITE
LIVE PREVIEW

Hands-on Intro To Developing Vision and LiDAR Classifier Formula - - PowerPoint PPT Presentation

Hands-on Intro To Developing Vision and LiDAR Classifier Formula Student Driverless Workshop powered by Introduction Sibo Zhu Zhijian Liu Haotian Tang Perception Lead at Perception Lead at Perception Lead at MIT Driverless


slide-1
SLIDE 1

powered by

Hands-on Intro To Developing Vision and LiDAR Classifier

Formula Student Driverless Workshop

slide-2
SLIDE 2

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Introduction

2

Sibo Zhu ▪ Perception Lead at MIT Driverless ▪ Research Assistant at MIT HAN Lab Zhijian Liu ▪ Perception Lead at MIT Driverless ▪ PhD student at MIT HAN Lab Haotian Tang ▪ Perception Lead at MIT Driverless ▪ PhD student at MIT HAN Lab

slide-3
SLIDE 3

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang 3

Introduction

slide-4
SLIDE 4

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Vision Perception Task

4

Wide Angle Camera VIO Camera Stereovision Pair Wide Angle Camera VIO Camera Stereovision Pair

slide-5
SLIDE 5

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

System Requirement

5

▪ Latency

▪ Maximum view-to-actuation time for emergency stop from top speed during an acceleration run

▪ Mapping Accuracy

▪ Driven by downstream mapper

▪ Horizontal Field-of-View (FOV)

▪ perceive landmarks on the inside apex of a hairpin turn

▪ Look-ahead Distance

▪ depends on the full-stack-latency and vehicle deceleration rate

slide-6
SLIDE 6

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Depth Estimation

6

Object Detection

Step 1

Perspective-n-Point (PnP)

  • Ankit. et al. IV’19

Keypoints Detection

` ` ` ` ` `

Step2 Step3

Stereo Matching Algorithm Al

Step2

+

Mono Stereo

slide-7
SLIDE 7

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Hands-on Tutorial To Train Your Own Cone Detection Network

30.08.2020 7

colab.research.google.com

slide-8
SLIDE 8

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang 8

Colab Tutorial:colab.research.google.com

slide-9
SLIDE 9

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Software Design - Keypoints Detection

30.08.2020 9

▪ Detects seven keypoints on each YOLOv3 detection ▪ A residual NN that leverages the geometric relationship between keypoints ▪ Seven detected key points will be then used in a Perspective-n-Point (PnP) to get depth

slide-10
SLIDE 10

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Hands-on Tutorial To Train Your Own Keypoints Detection Network

30.08.2020 10

colab.research.google.com

slide-11
SLIDE 11

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang 11

Colab Tutorial:colab.research.google.com

slide-12
SLIDE 12

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Validation - Accuracy

30.08.2020 12

slide-13
SLIDE 13

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Validation - Latency

30.08.2020 13

Open Sourced here: github.com/cv-core

slide-14
SLIDE 14

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Beyond This Computer Vision Tutorial…

▪ Change the object detection backbone from YOLOv3 to YOLOv4/ EfficientNet/etc ▪ Adding temporal information for more stable and accurate detection

▪ Temporal Shift Module: hanlab.mit.edu/projects/tsm/

▪ Inference with TensorRT in C++

▪ Open sourced here: github.com/cv-core

▪ Prune the full YOLO architecture for cone detection task ▪ Quantization (int8) for even faster inference

14 30.08.2020

slide-15
SLIDE 15

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

LiDAR Perception

15 30.08.2020

slide-16
SLIDE 16

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Thank you for your attention

16

slide-17
SLIDE 17

powered by

Hands-on Intro to Developing Vision and LiDAR Classifiers

Formula Student Driverless Workshop

slide-18
SLIDE 18

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Speaker Introduction

Sibo Zhu ▪ RA at MIT HAN Lab ▪ Perception Lead at MIT Driverless Zhijian Liu ▪ PhD student at MIT ▪ Perception Lead at MIT Driverless Haotian Tang ▪ PhD student at MIT ▪ Perception Lead at MIT Driverless

slide-19
SLIDE 19

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

LiDAR Perception

Song Han Feb 22, 2018

3D LiDAR Sensor 3D Point Cloud 500k+ points: (x, y, z, intensity)

slide-20
SLIDE 20

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

LiDAR Perception Task

Velodyne 32C LiDAR

slide-21
SLIDE 21

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Autonomous Racing Vehicle: Objectives

High Accuracy (Prevent Collisions) Low Latency (Drive Faster)

slide-22
SLIDE 22

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Autonomous Racing Vehicle: Challenges

Self-Driving Cars A whole trunk of computers! We need more efficient algorithms that do not consume intensive computations.

slide-23
SLIDE 23

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Autonomous Racing Vehicle: Challenges

Longer Distance Fewer Laser Rings on Objects Fewer Laser Points Shorter Distance Too many Laser Rings on Objects Too many Laser Points

slide-24
SLIDE 24

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Efficient LiDAR Perception: Bottleneck

Off-chip DRAM access is much more expensive than arithmetic operation! Random memory access is inefficient due to the potential bank conflicts!

668 167 30 Mult and Add SRAM MemoryDRAM Memory Bandwidth (GB/s) 20x slower

Random Memory Access

8

Sequential Memory Access

8

slide-25
SLIDE 25

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Voxel-Based Models: Cubically-Growing Memory

VoxNet [IROS’15] 3D ShapeNets [CVPR’15] 3D U-Net [MICCAI’16]

20 40 60 100 120 Voxel Resolution GPU Memory (GB) 128 x 128 x 128 resolution 83 GB (Titan XP x 7) 7% information loss 80 2 4 6 8 64 x 64 x 64 resolution 11 GB (Titan XP x 1) 42% information loss

slide-26
SLIDE 26

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Point-Based Models: Sparsity Overheads

51,8 2,9 45,3 36,3 51,5 12,2 57,4 27,0 15,6 4,9 0,0 95,1

Irregular Access Dynamic Kernel Actual Computation

DGCNN PointCNN SpiderCNN Ours

Runtime (%)

PointCNN [NeurIPS’18] PointNet [CVPR’17] DGCNN [SIGGRAPH’19]

slide-27
SLIDE 27

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Point-Voxel Convolution (PVConv)

Devoxelize Normalize Voxelize Convolve Fuse Multi-Layer Perceptron

slide-28
SLIDE 28

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Point-Voxel Convolution (PVConv)

Point-Based Feature Transformation (Fine-Grained)

Multi-Layer Perceptron

slide-29
SLIDE 29

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Point-Voxel Convolution (PVConv)

Devoxelize Voxelize Convolve

Voxel-Based Feature Aggregation (Coarse-Grained)

slide-30
SLIDE 30

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Point-Voxel Convolution (PVConv)

Devoxelize Normalize Voxelize Convolve Fuse

Point-Based Feature Transformation (Fine-Grained) Voxel-Based Feature Aggregation (Coarse-Grained)

Multi-Layer Perceptron

slide-31
SLIDE 31

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Low Resolution with Constrained Memory

Downsampled Scene Original Scene

slide-32
SLIDE 32

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Sparse Point-Voxel Convolution (SPVConv)

Multi-Layer Perceptron Fuse Devoxelize Voxelize Sparse Convolution ×N Sparse Convolution Branch Point Branch

slide-33
SLIDE 33

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Sparse Point-Voxel Convolution (SPVConv)

Multi-Layer Perceptron Fuse Devoxelize Voxelize Sparse Convolution ×N Sparse Convolution Branch Point Branch

slide-34
SLIDE 34

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Designing Efficient 3D Modules (SPVConv)

person person cyclist trunk trunk traffic sign

slide-35
SLIDE 35

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang Elastic Trans. Channel Elastic Res. Channel

Elastic Mid. Channel

Elastic Res. Channel Dynamic ResBlock Elastic Res. Channel

Elastic Mid. Channel

Elastic Res. Channel Dynamic ResBlock

Multi-Layer Perceptron Fuse Devoxelize Voxelize Sparse Convolution

×N

Sparse Convolution Branch

Point Branch

Searching Efficient 3D Architectures (3D-NAS)

slide-36
SLIDE 36

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang Elastic Trans. Channel Elastic Res. Channel

Elastic Mid. Channel Elastic Res. Channel

Dynamic ResBlock Elastic Res. Channel

Elastic Mid. Channel

Elastic Res. Channel Dynamic ResBlock

Multi-Layer Perceptron Fuse Devoxelize Voxelize Sparse Convolution

×N

Sparse Convolution Branch

Point Branch

Searching Efficient 3D Architectures (3D-NAS)

slide-37
SLIDE 37

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Elastic Trans. Channel Elastic Res. Channel

Elastic Mid. Channel

Elastic Res. Channel

Dynamic ResBlock

Elastic Res. Channel

Elastic Mid. Channel

Elastic Res. Channel

Dynamic ResBlock

Multi-Layer Perceptron Fuse Devoxelize Voxelize Sparse Convolution

×N

Sparse Convolution Branch

Point Branch

Searching Efficient 3D Architectures (3D-NAS)

slide-38
SLIDE 38

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang Elastic Trans. Channel Elastic Res. Channel Elastic Mid. Channel Elastic Res. Channel Dynamic ResBlock Elastic Res. Channel Elastic Mid. Channel Elastic Res. Channel Dynamic ResBlock

Multi-Layer Perceptron Fuse Devoxelize Voxelize Sparse Convolution

×N

Sparse Convolution Branch

Point Branch

Searching Efficient 3D Architectures (3D-NAS)

slide-39
SLIDE 39

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Searching Efficient 3D Architectures (3D- NAS)

Evolutionary Architecture Search

mutate

+

crossov er

Train Super Network Fine-Grained Channel + Elastic Depth Weight Sharing Uniform Sampling

GPU#1 GPU#N …

#Cin #Cout max #Cin max #Cout

Stage I (Depth: 3) Stage II (Depth: 2,3) Stage III (Depth: 1,2,3)

slide-40
SLIDE 40

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Details: Small Kernel Matters

Large kernels are efficient in 2D-NAS

slide-41
SLIDE 41

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Details: Small Kernel Matters

1x 2x 0,5 1,0 1,5 2,0 2,5 #MACs Two 3x3x3 One 5x5x5 1x 5x 6x 0,5 1,5 2,5 3,5 4,5 5,5 Kernel Map Cost Two 3x3x3 One 5x5x5 Hybrid Small Kernels Large Kernels 3D Deep Learning Cost of large kernels in 3D deep learning is more prohibitive than 2D.

slide-42
SLIDE 42

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Details: Elastic Network Depths

Elastic Trans. Channel Elastic Res. Channel Elastic Mid. Channel Elastic Res. Channel Dynamic ResBlock Elastic Res. Channel Elastic Mid. Channel Elastic Res. Channel Dynamic ResBlock

slide-43
SLIDE 43

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Details: Elastic Network Depths

Scaling network channels only cannot result in efficient models.

Elastic Trans. Channel Elastic Res. Channel Elastic Mid. Channel Elastic Res. Channel Dynamic ResBlock Elastic Res. Channel Elastic Mid. Channel Elastic Res. Channel Dynamic ResBlock

… FLOPs: 7.5G 1.9G Latency: 105 ms 96 ms (4x) (1.1x)

slide-44
SLIDE 44

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Challenge: Sample Efficiency

1.280.000

19.112

10000,0 100000,0 1000000,0 Number of Training Samples ImageNet SemanticKITTI

67x less

85.000

17.940

18000 36000 54000 72000 90000 Number of Archs Sampled ImageNet SemanticKITTI

4.7x less

slide-45
SLIDE 45

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Solution: Distributed Sampling Across GPUs

61,5 62,9 63,5 64,7 60,4 62,0 62,8 63,5 60 61 62 63 64 65 15 26 37 48 59 70 mIoU # MACs (G) Distributed Sampling Synchronized Sampling

GPU #1 GPU #2

… …

GPU #N

Distributed Sampling

Different sub-networks on different GPUs GPU #1 GPU #2

… …

GPU #N

Synchronized Sampling

The same sub-networks on different GPUs

slide-46
SLIDE 46

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Results: 3D Semantic Scene Segmentation

  • SemanticKITTI is the largest 3D point cloud semantic segmentation dataset. It is

29x larger than ScanNet, 160x larger than S3DIS.

  • SemanticKITTI is collected from real driving scenarios, and provides point-level

annotation for video sequences.

slide-47
SLIDE 47

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Results: 3D Semantic Scene Segmentation

We achieve 8x MACs reduction and 3x speedup over MinkowskiNet with SPVNAS

63 64

60,0 60,5 61,0 61,5 62,0 62,5 63,0 63,5 64,0 mIoU

MinkowskiNet SPVNAS

114G 15G

0,0 14,4 28,8 43,1 57,5 71,9 86,3 100,6 115,0 #MACs

7.6x smaller

294 ms 110 ms

0,0 37,5 75,0 112,5 150,0 187,5 225,0 262,5 300,0 GTX 1080Ti Latency

2.7x faster

slide-48
SLIDE 48

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Results: 3D Semantic Scene Segmentation

58 60 63 63 57 59 61 63 65 67 24 48 72 96 120

mIoU #MACs (G)

MinkowskiNet

57,5 60,0 62,8 63,1 57 59 61 63 65 67 110 154 198 242 286 330

mIoU GPU Latency (ms) Both a better module (SPVConv) and 3D-NAS improve the performance of MinkowskiNet.

slide-49
SLIDE 49

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Results: 3D Semantic Scene Segmentation

58,5 61,6 64,4 65,3 58 60 63 63 57 59 61 63 65 67 24 48 72 96 120

mIoU #MACs (G)

SPVCNN (Ours) MinkowskiNet

57,5 60,0 62,8 63,1 59 62 64 65 57 59 61 63 65 67 110 154 198 242 286 330

mIoU GPU Latency (ms) Both a better module (SPVConv) and 3D-NAS improve the performance of MinkowskiNet.

slide-50
SLIDE 50

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Results: 3D Semantic Scene Segmentation

63,6 64,5 65,2 66,0 66,4 58,5 61,6 64,4 65,3 58 60 63 63 57 59 61 63 65 67 24 48 72 96 120

mIoU #MACs (G)

SPVNAS (Ours) SPVCNN (Ours) MinkowskiNet

63,6 64,5 65,2 66,0 66,4 57,5 60,0 62,8 63,1 59 62 64 65 56 58 60 62 64 66 68 110 154 198 242 286 330

mIoU GPU Latency (ms) Both a better module (SPVConv) and 3D-NAS improve the performance of MinkowskiNet.

slide-51
SLIDE 51

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Results: 3D Semantic Scene Segmentation

We achieve up to 25 mIoU improvement on safety-critical small objects. 61 66 60,0 61,0 62,0 63,0 64,0 65,0 66,0 Person IoU

MinkowskiNet SPVNAS

40 52 36,0 38,0 40,0 42,0 44,0 46,0 48,0 50,0 52,0 Bicycle IoU

+11.2 IoU +4.8 IoU

19 44 15,0 20,0 25,0 30,0 35,0 40,0 45,0 Motorcyclist IoU

+25. IoU

slide-52
SLIDE 52

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Results: 3D Semantic Scene Segmentation

We achieve up to 58x MACs reduction and 46x params reduction over projection-based methods.

49,9 56 57 60

47,5 49,1 50,8 52,4 54,0 55,6 57,3 58,9 60,5 mIoU

DarkNet SqueezeSegV3 PolarNet SPVNAS

376G 515G 135G

9G

0,0 65,0 130,0 195,0 260,0 325,0 390,0 455,0 520,0 #MACs

50M 26M 14M 1M

0,0 6,3 12,5 18,8 25,0 31,3 37,5 43,8 50,0 #Params

46x smaller 58x smaller +10.4 mIoU

slide-53
SLIDE 53

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

MinkowskiNet SPVNAS (Ours) Ground Truth

traffic sign boundary boundary traffic sign bicycle person pole person pole bicycle

Results: 3D Semantic Scene Segmentation

slide-54
SLIDE 54

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Demo: Significantly Faster than MinkowskiNets

Mean IoU: 63.1 Throughput: 3.4 FPS (21.7M Params 114.0G FLOPs) Mean IoU: 63.6 Throughput: 9.1 FPS (2.6M Params 15.0G FLOPs)

MinkowskiNet SPVNAS (Ours)

SPVNAS outperforms the state-of-the-art MinkowskiNet (with 3x measured speedup and 8x model size reduction).

slide-55
SLIDE 55

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Demo: Faster than 2D, Accuracy of 3D

DarkNet53Seg SPVNAS (Ours)

Mean IoU: 49.9 Throughput: 9.7 FPS 50.4M Params 376.3G FLOPs Mean IoU: 60.3 (>

KPConv)

Throughput: 11.2 FPS 1.1M Params 8.9G FLOPs

slide-56
SLIDE 56

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Results: Evolutionary Architecture Search

36 40 44 48 52 56

5 10 15 20 25 Search Iteration

Evolutionary Search Random Search

60,0 60,2 60,4 60,6 60,8 61,0 61,2 61,4 61,6

5 10 15 20 25 Network Index

61.5 60.7 60.0 61.1

slide-57
SLIDE 57

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Results: Evolutionary Architecture Search

23 % 77 %

MinkowskiNet

47% 53%

Encoder Decoder

SPVNAS - 20G SPVNAS balances the encoder / decoder computation ratio.

slide-58
SLIDE 58

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Results: 3D Object Detection

missing prediction correct predictio n duplicate predictions correct prediction

SECOND SPVCNN Ground Truth

slide-59
SLIDE 59

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang 59

Original Solution Accuracy: 95% Range: 8 meters Latency: 2 ms/object PVCNN Accuracy: 99.93% Range: 12 meters Latency: 1.25 ms/object

slide-60
SLIDE 60

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang 30.08.2020 Author: Zhijian Liu, Haotian Tang

slide-61
SLIDE 61

https://fsg.one/academy 30.08.2020 Author: Sibo Zhu, Zhijian Liu, Haotian Tang

Thank you!