Scaling-Up Deep Learning For Autonomous Vehicles JOSE M. ALVAREZ - - PowerPoint PPT Presentation

scaling up deep learning for autonomous vehicles
SMART_READER_LITE
LIVE PREVIEW

Scaling-Up Deep Learning For Autonomous Vehicles JOSE M. ALVAREZ - - PowerPoint PPT Presentation

Scaling-Up Deep Learning For Autonomous Vehicles JOSE M. ALVAREZ | | San Jose 2019 1 NVIDIA AI-Infra 2 AI-Infra Team One of our top Goals Industry grade Deep Learning to take AV Perception DNN into production, tested in multiple


slide-1
SLIDE 1

1

Scaling-Up Deep Learning For Autonomous Vehicles

JOSE M. ALVAREZ | | San Jose 2019

slide-2
SLIDE 2

2

NVIDIA AI-Infra

slide-3
SLIDE 3

3

AI-Infra Team

One of our top Goals

Industry grade Deep Learning to take AV Perception DNN into production, tested in multiple locations and conditions.

slide-4
SLIDE 4

4

slide-5
SLIDE 5

5

DL For Autonomous Vehicles

PBs of data, large-scale labeling, large- scale training, etc. POST /datasets/{id} Datasets Deep Learning Manually selected data Labels Train/test data Labeling Metrics Simulation, verification results Inference optimized DNN (TensorRT)

slide-6
SLIDE 6

6

AI-Infra Team

One of our top Goals

Industry grade Deep Learning to take AV Perception DNN into production, tested in multiple locations and conditions. High-quality system No failures in Millions of miles Quality-driven AV Perception

The Challenge of Scale

slide-7
SLIDE 7

7

Self-driving cars

requires tremendously large datasets for training and testing

slide-8
SLIDE 8

8

DL for Autonomous Driving

Data Collection fleet => 100 cars 2000h of data collected per car, per year Assuming 5 2MP cameras per car, radar data, etc. => 1 TB / h / car Grand total of 200 PB collected per year! Only 1/1000 likely to be used for training (curated, labeled data)

The Challenge of Scale

slide-9
SLIDE 9

9

DL for Autonomous Vehicles

PBs of data, large-scale labeling, large- scale training, etc. POST /datasets/{id} SCALED-UP Dataset Deep Learning Manually selected data Labels Train/test data Labeling Metrics Simulation, verification results Inference optimized DNN (TensorRT) Trained Models Mine highly confused / most informative data

Active Learning

slide-10
SLIDE 10

10

DL for Autonomous Vehicles

Large Datasets: 12.1 years training a ResNet50-like network on Pascal 1.5 years on DGX1 w/ Volta With 8 DGX1s, and 1/10th of that training data, can train in 1 week

The Challenge of Scale

slide-11
SLIDE 11

11

DL for Autonomous Vehicles

PBs of data, large-scale labeling, large- scale training, etc. POST /datasets/{id} Datasets Deep Learning Manually selected data Labels Train/test data Labeling Metrics Simulation, verification results Inference optimized DNN (TensorRT) Trained Models Mine highly confused / most informative data

Accuracy / Efficiency DL

slide-12
SLIDE 12

12

DL for Autonomous Driving

Robustness / Reliable: Tested around the world under multiple conditions

The Challenge of Scale

Need to show 0 failures in > 1M miles, covering 1000s of Conditions…

slide-13
SLIDE 13

13

DL for Autonomous Vehicles

PBs of data, large-scale labeling, large- scale training, etc. POST /datasets/{id} Datasets Deep Learning Manually selected data Labels Train/test data Labeling Metrics Simulation, verification results Inference optimized DNN (TensorRT) Trained Models Mine highly confused / most informative data

Robustness: (Domain Adaptation,…)

slide-14
SLIDE 14

14

Talk Road Map

  • Creating the Right Datasets
  • Active Learning
  • Domain Adaptation
  • Improving Network Accuracy / Efficiency via overparameterization
  • Joint Training and pruning
  • Exploiting linear redundancies to train small networks.
slide-15
SLIDE 15

15

Creating the right datasets

is the cornerstone of (supervised) machine learning.

slide-16
SLIDE 16

vs

Some Samples Are Much More Informative Than Others

Creating the Right Datasets

slide-17
SLIDE 17
  • 1. How do we find the most informative

unlabeled data to build the right datasets the fastest?

  • 2. How do we build training datasets that are

1/1000 the size for the same result?

slide-18
SLIDE 18

Active Learning

slide-19
SLIDE 19

19

Active Learning

Training models Collecting data

Model uncertainty

slide-20
SLIDE 20

20

Bayesian networks are the principled way to model uncertainty. However, they are computationally demanding:

  • Training: Intractable without approximations.
  • Testing: distributions need ~100 forward passes (varying the model)

Active Learning needs uncertainty

Bayesian Deep Networks (BNN)

slide-21
SLIDE 21

21

Active Learning

A common (cheaper) approach consists of using ensembles of networks:

  • Samples from the same distribution as the training set will have consensus

while other samples will not.

  • Ensembles do not approximate uncertainty in the same manner as a BNN.
  • I.e., parameters in different members serve for different purpose.

Bayesian Deep Networks (BNN)

slide-22
SLIDE 22

22

Active Learning

We propose an approximation to BNN to train a network using ensembles.

  • We regularize the weights in the ensemble to approximate probability

distributions.

Bayesian Deep Networks (BNN)

[Chitta, Alvarez, Lesnikowski], Large-Scale Visual Active Learning with Deep Probabilistic Ensembles. Arxiv 2018

slide-23
SLIDE 23

23

Active Learning

Given this new network design, we can sample from this and quantify the uncertainty of the model on a new (unlabeled) sample. Label those where the model is more uncertain.

Bayesian Deep Networks (BNN)

[Chitta, Alvarez, Lesnikowski], Large-Scale Visual Active Learning with Deep Probabilistic Ensembles. Arxiv 2018

slide-24
SLIDE 24

24

Classification Results

slide-25
SLIDE 25

25

Active Learning

Quantitative Results

Image classification on Cifar-10:

  • up to 50k training images
  • 10K validation images
  • ResNet-18

[Chitta, Alvarez, Lesnikowski], Large-Scale Visual Active Learning with Deep Probabilistic Ensembles. Arxiv 2018

slide-26
SLIDE 26

26

Active Learning

Quantitative Results

Competitive results using ~1/4th of the training data

[Chitta, Alvarez, Lesnikowski], Large-Scale Visual Active Learning with Deep Probabilistic Ensembles. Arxiv 2018

slide-27
SLIDE 27

27

Active Learning

Quantitative Results

Ours Ours

[Chitta, Alvarez, Lesnikowski], Large-Scale Visual Active Learning with Deep Probabilistic Ensembles. Arxiv 2018

slide-28
SLIDE 28

28

Active Learning

Quantitative Results

CIFAR-10

[Chitta, Alvarez, Lesnikowski], Large-Scale Visual Active Learning with Deep Probabilistic Ensembles. Arxiv 2018

slide-29
SLIDE 29

29

Active Learning

Quantitative Results

[Chitta, Alvarez, Lesnikowski], Large-Scale Visual Active Learning with Deep Probabilistic Ensembles. Arxiv 2018

How much data we need to outperform the performance using the entire dataset.

Dataset % data CIFAR-10 ~50 CIFAR-100 ~80 SVHN ~25

slide-30
SLIDE 30

30

Beyond Classification

slide-31
SLIDE 31

31

Active Semantic Segmentation

Framework

[Chitta, Alvarez, Lesnikowski], Large-Scale Visual Active Learning with Deep Probabilistic Ensembles. Under review

slide-32
SLIDE 32

32

Domain Adaptation

(Beyond a single domain / location)

slide-33
SLIDE 33

33

Domain Adaptation

Backlit Snow Day Clear Fog Rain Cloudy Artificial light Night Twilight Urban Freeway Unmarked Street

Geographic Locations

slide-34
SLIDE 34

34

Domain Adaptation

slide-35
SLIDE 35

35

Domain Adaptation

  • 4. At train time, use only (synthetic) source images and annotations.

Domain Images Annotations Source ☺ ☺ Target  

Synthetic data can be

  • btained in large

amounts and is labeled automatically.

slide-36
SLIDE 36

36

Domain Adaptation

  • 4. At train time, use only (synthetic) source images and annotations.

Domain Images Annotations Source ☺ ☺ Target  

Unfortunately, in general, a network trained on synthetic data performs relatively poorly

  • n real images.

Most require access to real images, albeit unsupervised, during training.

slide-37
SLIDE 37

37

Domain Adaptation

Efficient use of Synthetic Data

[Saleh, Salzmann, Alvarez et al. 2018], Efficient use of Synthetic data for Semantic Segmentation, ECCV2018

Our approach uses synthetic images and does not require seeing any real images at training time.

Domain Images Annotations Source ☺ ☺ Target  

slide-38
SLIDE 38

38

Domain Adaptation

Efficient use of Synthetic Data

Our approach uses synthetic images and does not require seeing any real images at training time.

Key observation: Foreground and background classes are not affected in the same manner by the domain shift.

[Saleh, Salzmann, Alvarez et al. 2018], Efficient use of Synthetic data for Semantic Segmentation, ECCV2018

slide-39
SLIDE 39

39

  • 1. Texture of background classes is realistic -> semantic segmentation.

Domain Adaptation

Efficient use of Synthetic Data

[Saleh, Salzmann, Alvarez et al. 2018], Efficient use of Synthetic data for Semantic Segmentation, ECCV2018

slide-40
SLIDE 40

40

  • 1. Texture of background classes is realistic -> semantic segmentation.
  • 2. Texture of foreground classes is not photo-realistic, but their shape looks

natural -> detection-based.

Domain Adaptation

Efficient use of Synthetic Data

[Saleh, Salzmann, Alvarez et al. 2018], Efficient use of Synthetic data for Semantic Segmentation, ECCV2018

slide-41
SLIDE 41

41

Inference on real data

Domain Adaptation

Efficient use of Synthetic Data

[Saleh, Salzmann, Alvarez et al. 2018], Efficient use of Synthetic data for Semantic Segmentation, ECCV2018

slide-42
SLIDE 42

42

Domain Adaptation

Efficient use of Synthetic Data

[Saleh, Salzmann, Alvarez et al. 2018], Efficient use of Synthetic data for Semantic Segmentation, ECCV2018

slide-43
SLIDE 43

43

Domain Adaptation

Efficient use of Synthetic Data

Adding Pseudo-labels:

(unsupervised real training data)

[Saleh, Salzmann, Alvarez et al. 2018], Efficient use of Synthetic data for Semantic Segmentation, ECCV2018

slide-44
SLIDE 44

44

Domain Adaptation

Efficient use of Synthetic Data

Adding Pseudo-labels: Comparison on models trained

  • n synthetic data

[Saleh, Salzmann, Alvarez et al. 2018], Efficient use of Synthetic data for Semantic Segmentation, ECCV2018

slide-45
SLIDE 45

45

Domain Adaptation

Efficient use of Synthetic Data

Adding Pseudo-labels: Comparison to domain adaptation and weakly- supervised methods

[Saleh, Salzmann, Alvarez et al. 2018], Efficient use of Synthetic data for Semantic Segmentation, ECCV2018

slide-46
SLIDE 46

46

Accuracy vs Efficiency (for Large datasets)

slide-47
SLIDE 47

47

Accuracy vs Efficiency

TRAINING TESTING 2015 2016 2013

slide-48
SLIDE 48

48

Accuracy vs Efficiency

Efficient Training of DNN

Goal: maximize training resources while obtaining deployment ‘friendly’ network.

slide-49
SLIDE 49

49

Over-parameterization

slide-50
SLIDE 50

50

Accuracy vs Efficiency

Same receptive field Non-linearity

Capacity Num. parameters

slide-51
SLIDE 51

51

Accuracy vs Efficiency

Validation Accuracy on a 3x3-based Convnet (orange) and the equivalent 5x5-based Convnet (blue)

https://blog.sicara.com/about-convolutional-layer-convolution-kernel-9a7325d34f7d

slide-52
SLIDE 52

52

Accuracy vs Efficiency

Same receptive field Non-linearity Non-linearity

Capacity Num. parameters FLOPS ?

n x n as [1 x n] and [n x 1]

slide-53
SLIDE 53

53

Accuracy vs Efficiency

Filter Decompositions for Real-time Semantic Segmentation

[Romera, Alvarez et al.] , Efficient ConvNet for Real-Time Semantic Segmentation. IEEE-IV 2017, T-ITS 2018 [Alvarez and Petersson], DecomposeMe: Simplifying ConvNets for End-to-End Learning. Arxiv 2016

slide-54
SLIDE 54

54

Accuracy vs Efficiency

Filter Decompositions for Real-time Semantic Segmentation

Train mode Pixel accuracy Class IoU Category IoU Scratch 94.7 % 70.0 % 86.0 % Pre-trained 95.1 % 71.5 % 86.9 %

TEGRA-TX1 TITAN-X Fwd Pass 512x256 1024x512 2048x1024 512x256 1024x5 12 2048x1024 Time 85 ms 310 ms 1240 ms 8 ms 24 ms 89 ms FPS 11.8 3.2 0.8 125.0 41.7 11.2

Cityscapesdataset (19 classes, 7 categories) Forward-Time: Cityscapes 19 classes

[Romera, Alvarez et al.] , Efficient ConvNet for Real-Time Semantic Segmentation. IEEE-IV 2017, T-ITS 2018

slide-55
SLIDE 55

55

Accuracy vs Efficiency

[Romera, Alvarez et al.] , Efficient ConvNet for Real-Time Semantic Segmentation. IEEE-IV 2017, T-ITS 2018

slide-56
SLIDE 56

56

Accuracy vs Efficiency

Efficient Training of DNN

Goal: maximize training resources while obtaining deployment ‘friendly’ network.

slide-57
SLIDE 57

57

Accuracy vs Efficiency

Efficient Training of DNN

Goal: maximize training resources while obtaining deployment ‘friendly’ network.

slide-58
SLIDE 58

58

Accuracy vs Efficiency

Common Approach

Train a large model (trade-off accuracy / computational cost)

DEPLOY

Optimize for Specific hardware

Prune / Optimize

For a specific application

TRAIN

Promising model

Regularization at parameter level

slide-59
SLIDE 59

59

Accuracy vs Efficiency

Joint Training and Pruning Deep Networks

Train a large model (trade-off accuracy / computational cost)

DEPLOY

Optimize for Specific hardware

Joint Train / Pruning

slide-60
SLIDE 60

60

Accuracy vs Efficiency

Joint Training and Pruning Deep Networks

Convolutional layer 5x1x3x3

Removed To be kept

slide-61
SLIDE 61

61

Accuracy vs Efficiency

Joint Training and Pruning Deep Networks

[Alvarez and Salzmann], Learning the number of neurons in Neural Nets, NIPS 2016 [Alvarez and Salzmann], Compression-aware training of DNN, NIPS 2017

Common approach:

slide-62
SLIDE 62

62

Accuracy vs Efficiency

Joint Training and Pruning Deep Networks

[Alvarez and Salzmann], Learning the number of neurons in Neural Nets, NIPS 2016 [Alvarez and Salzmann], Compression-aware training of DNN, NIPS 2017

Our Approach:

slide-63
SLIDE 63

63

Classification Results

slide-64
SLIDE 64

64

Accuracy vs Efficiency

Joint Training and Pruning Deep Networks

[Alvarez and Salzmann], Learning the number of neurons in Neural Nets, NIPS 2016 [Alvarez and Salzmann], Compression-aware training of DNN, NIPS 2017

Quantitative Results on ImageNet dataset:

1.2 million training images and 50.000 for validation split in 1000 categories. Between 5000 and 30000 training images per class. No data augmentation (random flip).

slide-65
SLIDE 65

65

Accuracy vs Efficiency

Joint Training and Pruning Deep Networks

[Alvarez and Salzmann], Learning the number of neurons in Neural Nets, NIPS 2016 [Alvarez and Salzmann], Compression-aware training of DNN, NIPS 2017

Quantitative Results on ImageNet

Train an over-parameterized architecture up to 768 neurons per layer (Dec8-768)

slide-66
SLIDE 66

66

Accuracy vs Efficiency

Joint Training and Pruning Deep Networks

[Alvarez and Salzmann], Learning the number of neurons in Neural Nets, NIPS 2016 [Alvarez and Salzmann], Compression-aware training of DNN, NIPS 2017

Quantitative Results on ImageNet

slide-67
SLIDE 67

67

Accuracy vs Efficiency

Joint Training and Pruning Deep Networks

[Alvarez and Salzmann], Learning the number of neurons in Neural Nets, NIPS 2016 [Alvarez and Salzmann], Compression-aware training of DNN, NIPS 2017

Quantitative Results on ICDAR character recognition dataset

slide-68
SLIDE 68

68

Accuracy vs Efficiency

Joint Training and Pruning Deep Networks

[Alvarez and Salzmann], Learning the number of neurons in Neural Nets, NIPS 2016 [Alvarez and Salzmann], Compression-aware training of DNN, NIPS 2017

Quantitative Results on ICDAR character recognition dataset

Train an over-parameterized architecture up to 512 neurons per layer (Dec3-512)

slide-69
SLIDE 69

69

Accuracy vs Efficiency

Joint Training and Pruning Deep Networks

[Alvarez and Salzmann], Learning the number of neurons in Neural Nets, NIPS 2016 [Alvarez and Salzmann], Compression-aware training of DNN, NIPS 2017

Quantitative Results on ICDAR character recognition dataset

slide-70
SLIDE 70

70

Accuracy vs Efficiency

Joint Training and Pruning Deep Networks

[Alvarez and Salzmann], Learning the number of neurons in Neural Nets, NIPS 2016 [Alvarez and Salzmann], Compression-aware training of DNN, NIPS 2017

Dec1 Dec2 Dec3 Dec4 Dec5 FC

100

Dec6 Dec7 Dec8-2 Dec7-1 Dec7-2 Dec8 Dec8-1 Skip connection Skip connection

slide-71
SLIDE 71

71

Accuracy vs Efficiency

L1v L1h L2v L2h L3v L3h L4v L4h L5v L5h L6v L6h L7v L7hL7-1v L7-1h L7-2v L7-2hL8v L8hL8-1v L8-1h L8-2v L8-2h 100 200 300 400 500 600

Layer Name Number of neurons

Initial number Learned number

Dec1 Dec2 Dec3 Dec4 Dec5 FC

100

Dec6 Dec7 Dec8-2 Dec7-1 Dec7-2 Dec8 Dec8-1 Skip connection Skip connection

slide-72
SLIDE 72

72

Accuracy vs Efficiency

L1v L1h L2v L2h L3v L3h L4v L4h L5v L5h L6v L6h L7v L7hL7-1v L7-1h L7-2v L7-2hL8v L8hL8-1v L8-1h L8-2v L8-2h 100 200 300 400 500 600

Layer Name Number of neurons

Initial number Learned number

Dec1 Dec2 Dec3 Dec4 Dec5 FC

100

Dec6 Dec7 Dec8-2 Dec7-1 Dec7-2 Dec8 Dec8-1 Skip connection Skip connection

(No drop in accuracy)

slide-73
SLIDE 73

73

Object Detection Results

KITTI

slide-74
SLIDE 74

74

Accuracy vs Efficiency

Object Detection

Prune / Optimize

For a specific application

TRAIN

Promising model

KITTI

slide-75
SLIDE 75

75

Accuracy vs Efficiency

Object Detection Prune / Optimize TRAIN Joint Train / Pruning

KITTI

slide-76
SLIDE 76

76

Accuracy vs Efficiency

Compression-aware Training of DNN

Convolutional layer 5x1x3x3

Removed To be kept

[Alvarez and Salzmann], Learning the number of neurons in Neural Nets, NIPS 2016 [Alvarez and Salzmann], Compression-aware training of DNN, NIPS 2017

slide-77
SLIDE 77

77

Accuracy vs Efficiency

Compression-aware Training of DNN

Uncorrelated filters should maximize the use of each parameter / kernel

Cross-correlation of Gabor Filters.

slide-78
SLIDE 78

78

Accuracy vs Efficiency

Compression-aware Training of DNN

[P Rodríguez, J Gonzàlez, G Cucurull, J. M. Gonfaus, X. Roca] Regularizing CNNs with Locally Constrained Decorrelations. ICLR 2017

Weak-Points

Significantly larger training time (prohibitive at large scale). Usually drops in accuracy. Orthogonal filters are difficult to compress (post-processing).

slide-79
SLIDE 79

79

Accuracy vs Efficiency

Compression-aware Training of DNN

Convolutional layer 5x1x3x3

Removed To be kept

slide-80
SLIDE 80

80

Accuracy vs Efficiency

Compression-aware Training of DNN

[Alvarez and Salzmann], Compression-aware training of DNN, NIPS 2017

slide-81
SLIDE 81

81

Accuracy vs Efficiency

Compression-aware Training of DNN

[Alvarez and Salzmann], Compression-aware training of DNN, NIPS 2017

Our Approach:

slide-82
SLIDE 82

82

Classification Results

slide-83
SLIDE 83

83

Accuracy vs Efficiency

Compression-aware Training of DNN

[Alvarez and Salzmann], Compression-aware training of DNN, NIPS 2017

Quantitative Results on ImageNet using ResNet50*

1x1, 64 3x1, 64 1x3, 64 1x1, 256 256-d relu relu relu

slide-84
SLIDE 84

84

Training Efficient

(side benefit)

slide-85
SLIDE 85

85

Accuracy vs Efficiency

Compression-aware Training of DNN

[Alvarez and Salzmann], Compression-aware training of DNN, NIPS 2017

slide-86
SLIDE 86

86

Accuracy vs Efficiency

Compression-aware Training of DNN

[Alvarez and Salzmann], Compression-aware training of DNN, NIPS 2017

slide-87
SLIDE 87

87

Up to 70% train speed-up (similar accuracy)

Accuracy vs Efficiency

Compression-aware Training of DNN

[Alvarez and Salzmann], Compression-aware training of DNN, NIPS 2017

slide-88
SLIDE 88

88

Accuracy vs Efficiency

Compression-aware Training of DNN

[Alvarez and Salzmann], Compression-aware training of DNN, NIPS 2017

Is Over-parameterization needed? Observations:

Additional training parameters are needed to initially help the optimizer. Small models are explicitly constrained, same training regime may not be fair. Other optimizers lead to slightly better results in optimizing compact networks from scratch.

slide-89
SLIDE 89

89

Accuracy vs Efficiency

Compression-aware Training of DNN

[Alvarez and Salzmann], Compression-aware training of DNN, NIPS 2017

Number of parameters decreases Number of layers increases

Data Movements may be more significant than current savings.

slide-90
SLIDE 90

90

Accuracy vs Efficiency (more on over-parameterization)

slide-91
SLIDE 91

91

Same receptive field Non-linearity

Capacity Num. parameters Num. layers

Accuracy vs Efficiency

slide-92
SLIDE 92

92

ExpandNets Exploiting Linear Redundancies

slide-93
SLIDE 93

11x11 conv, 64 Input 224x224 5x5 conv, 192 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 11x11 conv, 64

[Guo, Alvarez, Salzmann], ExpandNets: Exploiting Linear Redundancy to Train Small Networks. Arxiv 2018

ExpandNets

slide-94
SLIDE 94

[Guo, Alvarez, Salzmann], ExpandNets: Exploiting Linear Redundancy to Train Small Networks. Arxiv 2018

ExpandNets

slide-95
SLIDE 95

[Guo, Alvarez, Salzmann], ExpandNets: Exploiting Linear Redundancy to Train Small Networks. Arxiv 2018

ExpandNets

slide-96
SLIDE 96

Classification Results

slide-97
SLIDE 97

192 64 384 3 input Conv1 Conv2 Conv3 Conv4 Conv5

N N 6 @ 3x3 128 @ 3x3 128 @ 3x3 64 @ 3x3

ImageNet Baseline Expanded N=128 46.72% 49.66% N=256 54.08% 55.46% N=512 58.35% 58.75%

[Guo, Alvarez, Salzmann], ExpandNets: Exploiting Linear Redundancy to Train Small Networks. Arxiv 2018

ExpandNets

slide-98
SLIDE 98

Model Top-1 Top-5 MobileNetV2 70.78% 91.47% MobileNetV2- expanded 74.85% 92.15% MobileNetV2: The Next Generation of On-Device Computer Vision Networks

[Guo, Alvarez, Salzmann], ExpandNets: Exploiting Linear Redundancy to Train Small Networks. Arxiv 2018

ExpandNets

slide-99
SLIDE 99

Model Top-1 Top-5 MobileNetV2 70.78% 91.47% MobileNetV2- expanded 74.85% 92.15% MobileNetV2- expanded-nonlinear 74.17% 91.61% MobileNetV2- expanded (nonlinear Init) 75.46% 92.58% MobileNetV2: The Next Generation of On-Device Computer Vision Networks

[Guo, Alvarez, Salzmann], ExpandNets: Exploiting Linear Redundancy to Train Small Networks. Arxiv 2018

ExpandNets

3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64

slide-100
SLIDE 100

ExpandNet beyond classification

slide-101
SLIDE 101

[Guo, Alvarez, Salzmann], ExpandNets: Exploiting Linear Redundancy to Train Small Networks. Arxiv 2018

ExpandNets on Semantic Segmentation

Relative ~2.2% improvement

  • n mIoU

CITYSCAPES

slide-102
SLIDE 102

[Guo, Alvarez, Salzmann], ExpandNets: Exploiting Linear Redundancy to Train Small Networks. Arxiv 2018 Thanks Ian Ivanecky!

ExpandNets on Traffic Sign Recognition

Internal Dataset

Relative ~2.34% improvement on fscore

slide-103
SLIDE 103

103

Summary

slide-104
SLIDE 104

104

Summary

Creating the right datasets

  • Active Learning: Our Deep Probabilistic Ensembles achieve competitive

performance using 1/4th of the training data (progressively selected).

slide-105
SLIDE 105

105

Summary

Creating the right datasets

  • Synthetic to real
slide-106
SLIDE 106

106

Summary

Creating the right datasets Accuracy vs Efficiency(aka, the use of overparameterization)

  • Joint train and prune

L1v L1h L2v L2h L3v L3h L4v L4h L5v L5h L6v L6h L7v L7hL7-1v L7-1h L7-2v L7-2hL8v L8hL8-1v L8-1h L8-2v L8-2h 100 200 300 400 500 600

Layer Name Number of neurons

Initial number Learned number

slide-107
SLIDE 107

107

Summary

Creating the right datasets Accuracy vs Efficiency (aka, the use of overparameterization)

  • ExpandNets: Exploiting linear redundancy to Train Small Nets
slide-108
SLIDE 108

108

Scaling-Up Deep Learning For Autonomous Vehicles

JOSE M. ALVAREZ | | San Jose 2019