Squeezing down the computing Edit Master text styles Second level - PowerPoint PPT Presentation

Squeezing down the computing Edit Master text styles Second level requirements of deep neural networks Third level Fourth level Fifth level Albert Shaw, Daniel Hunter, Sammy Sidhu, and Forrest Iandola

Levels of automated driving Edit Master text styles Second level LEVEL Third level Driver Assistance 1 Fourth level Fifth level Advanced Driver Assistance LEVEL Partial Automation 2 (e.g. Tesla Autopilot) LEVEL Conditional Automation 3 LEVEL High Automation 4 Robo-taxis, robo-delivery, … LEVEL Full Automation 5 2

OFFLINE MAPS Edit Master text styles SENSORS PATH PLANNING & Second level ACTUATION Third level Fourth level CAMERA ULTRASONIC Fifth level REAL-TIME PERCEPTION RADAR LIDAR THE FLOW IMPLEMENTING AUTOMATED DRIVING 3

Deep learning is used in the best perception systems for automated driving 180x higher productivity with deep learning Edit Master text styles Chris Urmson, CEO of Aurora: With deep learning, an engineer can accomplish in one day what would Second level take 6 months of engineering effort with traditional algorithms. [1] Third level Fourth level 100x fewer errors with deep learning Fifth level Dmitri Dolgov, CTO of Waymo: "Shortly after we started using deep learning, we reduced our error-rate on pedestrian detection by 100x." [3] Deep learning has become the go-to approach Andrej Karpathy, Sr Director of AI at Tesla: "A neural network is a better piece of code than anything you or I could create for interpreting images and video." [2] [1] https://www.nytimes.com/2018/01/04/technology/self-driving-cars-aurora.html [2] https://medium.com/@karpathy/software-2-0-a64152b37c35 4 [3] https://medium.com/waymo/google-i-o-recap-turning-self-driving-cars-from-science-fiction-into-reality-with-the-help-of-ai-89dded40c63

Diverse Applications of Deep Learning for Computer Vision Image → Scalar or Vector Image → Image Image → Boxes Video Image Classification Edit Master text styles Second level Third level Fourth level Fifth level Semantic Segmentation [2] 2D Object Detection [4] Optical Flow [5] Image Classification [1] Depth Prediction [3] 3D Object Detection [4] Object Tracking [6] [1] O. Russakovsky et al. ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015. [2] M. Cordts et al. The Cityscapes Dataset for Semantic Urban Scene Understanding. CVPR, 2016. [3] Casser, Vincent et al. Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos. AAAI, 2018 [4] Liang, Ming, et al. Multi-Task Multi-Sensor Fusion for 3D Object Detection. CVPR, 2019. [5] Ilg, Eddy, et al. Flownet 2.0: Evolution of optical flow estimation with deep networks. CVPR. 2017. 5 [6] Bewley, Alex, et al. Simple online and realtime tracking. IEEE ICIP, 2016.

We don't just need deep learning… We need efficient deep learning Edit Master text styles Second level Third level Fourth level Fifth level Audi BMW + Intel https://www.slashgear.com/man-vs-machine-my-rematch- https://newsroom.intel.com/news-releases/bmw- against-audis-new-self-driving-rs-7-21415540/ group-intel-mobileye-will-autonomous-test-vehicles- roads-second-half-2017/ Waymo 6

We don't just need deep learning… We need efficient deep learning Edit Master text styles Second level Third level Trunkloads of servers cause problems: Fourth level Fifth level • Limited trunk space • Cost • Energy usage • Reduced EV battery range • Lower reliability • Massive heat dissipation 7

From high-end hardware to affordable hardware Edit Master text styles Second level Third level Fourth level Fifth level • 30 to 500 watts • 1 to 30 watts (for chip + memory + I/O) • 500s-5000s+ of dollars • 10s of dollars • 10s-100s of TOPS/s • 1s of TOPS/s 8

Tradeoffs for deployable DNN models for automotive deep learning practitioners Edit Master text styles Second level Third level Fourth level Low Fifth level Development Under-provisioned Benchmark-winning Cost less-accurate DNNs off-the-shelf DNNs Low Compute Low Error Resource Usage 9 Manually design a new DNN from scratch

Neural Architecture Search (NAS) to the Rescue NAS can co-optimize resource-efficiency and accuracy Edit Master text styles Second level Third level Fourth level Low Fifth level Development Under-provisioned Benchmark-winning Cost less-accurate DNNs off-the-shelf DNNs Neural Architecture Search (NAS) Low Compute Low Error Resource Usage 10 Manually design a new DNN from scratch

Edit Master text styles Second level Third level Fourth level Fifth level What's in the design space of Deep Neural Networks for computer vision? 11

Anatomy of a convolution layer IMPORTANT TO KNOW: MULTIPLE CHANNELS AND MULTIPLE FILTERS channels The number of channels in the current layer is determined by Edit Master text styles the number of filters (numFilt) Second level in the previous layer. Third level channels Fourth level Fifth level filterH dataH x numFilt filterW x batch size dataW 12

Recent history of DNN design for computer vision DNN Year Accuracy* Parameters (MB) Computation (GFLOPS Key Techniques (ImageNet-1k) per frame) AlexNet 2012 57.2% 240 1.4 Applying a DNN to a hard problem; Edit Master text styles ReLU; more depth (8 layers) Second level VGG-19 2014 75.2% 490 19.6 More depth (22 layers) Third level ResNet-152 2015 77.0% 230 22.6 More depth & residual connections Fourth level Fifth level SqueezeNet 2016 57.5% 4.8 0.72 Judicious use of filters and channels MobileNet-v1 2017 70.6% 16.8 0.60 1-channel 3x3 convolutions ShuffleNet-v1 2017 73.7% 21.6 1.05 Shuffle layers ShiftNet 2017 70.1% 16.4 … Shift layers SqueezeNext 2018 67.4% 12.8 1.42 Oblong convolution filters mNasNet-A3 2018 76.1% 20.4 0.78 Neural architecture search FBNet-C 2018 74.9% 22.0 0.75 Really fast neural architecture search * Top-1 single-model, single-crop accuracy 13

1. Kernel Reduction REDUCING THE HEIGHT AND WIDTH OF FILTERS channels channels Edit Master text styles Second level Third level 3 x numFilt 1 Fourth level x numFilt Fifth level 1 3 While 1x1 filters cannot see outside of a 1-pixel radius, they retain the ability to combine and reorganize information across channels. In our design space exploration that led up to SqueezeNet, we found that we could replace half the 3x3 filters with 1x1's without diminishing accuracy A "saturation point" is when adding more parameters doesn't improve accuracy. 14

2. Channel Reduction REDUCING THE NUMBER OF FILTERS AND CHANNELS Edit Master text styles 6 5 Second level 2 Third level 8 2 1 Fourth level Fifth level 3 3 x numFilt x numFilt 3 3 OLD layer L i+1 NEW layer L i+1 If we halve the number of filters in layer L i this halves the number of input channels in layer L i+1 4x reduction in number of parameters 15

3. Depthwise Separable Convolutions ALSO CALLED: "GROUP CONVOLUTIONS" or "CARDINALITY" Edit Master text styles Second level Third level 6 Fourth level 5 2 Fifth level 1 3 3 x numFilt x numFilt 3 3 Each 3x3 filter has 1 channel Each filter gets applied to a different channel of the input Popularized by MobileNet and ResNeXt 16

4. Shuffle Operations "shuffle" layer After applying aggressive kernel reduction, we may Edit Master text styles have 50-90% of the parameters in 1x1 convolutions Second level Third level Group-1x1 convs would lead to multiple DNNs that don't Fourth level communicate Fifth level Solution: shuffle layer after separable 1x1 convs Zhang, et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for 17 Mobile Devices. arXiv, 2017.

5. Shift Operations Shift each channel's activation grid by one cell "shift" layer Edit Master text styles Second level This allows all your filters to be 1x1xChannels Third level (and not 3x3) Fourth level Fifth level [1] B. Wu, et al. Shift: A Zero FLOP, Zero Parameter Alternative to Spatial 18 Convolutions. CVPR, 2018.

Edit Master text styles Second level Third level Fourth level Fifth level Device-specific DNN design considerations 19

Deep Learning Processors have arrived! THE SERVER SIDE Edit Master text styles Platform Computation Memory Computation- Power Year Second level (GFLOPS/s) Bandwidth to-bandwidth (TDP Watts) Third level (GB/s) ratio Fourth level NVIDIA K20 [1] 3500 208 17 225 2012 Fifth level (32-bit float) (GDDR5) NVIDIA V100 [2] 112000 900 124 250 2018 (yikes!) (16-bit float) (HBM2) Uh-oh… Processors are improving much faster than Memory. [1] https://www.nvidia.com/content/PDF/kepler/Tesla-K20-Passive-BD-06455-001-v05.pdf [2] http://www.nvidia.com/content/PDF/Volta-Datasheet.pdf (PCIe version) 20

Squeezing down the computing Edit Master text styles Second level - PowerPoint PPT Presentation

Squeezing down the computing Edit Master text styles Second level requirements of deep neural networks Third level Fourth level Fifth level Albert Shaw, Daniel Hunter, Sammy Sidhu, and Forrest Iandola Levels of automated driving Edit

Squeezing Information from Data at Exascale Joel Saltz Emory University Georgia Tech Squeezing

Study and experiment on the alternative technique of frequencydependent squeezing generation

Down Syndrome by Birth Order and Moms Age 3/20/2017 V0 2017-Down-Syndrome 1 2017-Down-Syndrome

Lay Them Down Chorus: Lay them down, Lay them down, Lay your branches down for Him Spread them

Squeezing GPU performance GPGPU 2015: High Performance Computing with CUDA University of Cape Town

GPU peak performance vs. CPU Squeezing GPU performance Peak Double Precision FLOPS Peak Memory

Social Prescribing in Down district County Down Rural Community Network Umbrella body for

1 Push-down Automata A push-down automaton is a finite automaton with an additional last-in

Squeezing the limit: Quantum benchmarks for the teleportation and storage of squeezed states NJP,

Stretching and squeezing time Nick Stroustrup Winston Anthony Walter Fontana Javier Apfeld

PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer Anbang Yao

Symplectic non-squeezing for the discrete nonlinear Schr odinger equation Alexander Tumanov

Squeezing State Spaces of (Attack-Defence) Trees l Knapik 1 Wojciech Penczek 1 Micha Laure

Squeezing a key through a carry bit Sean Devlin, Filippo Valsorda One month later a = a - b

Leakage Squeezing Revisited Vincent Grosso 1 , Fran cois-Xavier Standaert 1 , Emmanuel Prouff 2 .

Ti Time me Squeezing for Tiny Device ces DAC 2018, ISCA 2019

Stereo CSE 576 Ali Farhadi Several slides from

Depth Perception in Grasshopper -Shashank Chepurwar -Ritvik Srivastava Grasshopper -Agile

Using Geometry to Detect Grasp Poses in 3D Point Clouds ten Pas, Platt Northeastern University

Hawkular Metrics Metric Storage & Alerting Stefan Negrea About Me Co-Creator of Hawkular

3D Deep Learning on Geometric Forms Hao Su Many 3D representations are available Candidates:

Recent Trends in 3D Computer Vision and Deep Learning Introductory meeting Winter Semester

Initial word... Robogames Initial word... Robogames Initial word... Robogames The "CS

Results from the Telescope Array Experiment Charlie Jui University of Utah TeVPA 2013 Irvine,

Squeezing down the computing Edit Master text styles Second level - PowerPoint PPT Presentation

Squeezing down the computing Edit Master text styles Second level requirements of deep neural networks Third level Fourth level Fifth level Albert Shaw, Daniel Hunter, Sammy Sidhu, and Forrest Iandola Levels of automated driving Edit

Squeezing Information from Data at Exascale Joel Saltz Emory University Georgia Tech Squeezing

Study and experiment on the alternative technique of frequencydependent squeezing generation

Down Syndrome by Birth Order and Moms Age 3/20/2017 V0 2017-Down-Syndrome 1 2017-Down-Syndrome

Lay Them Down Chorus: Lay them down, Lay them down, Lay your branches down for Him Spread them

Squeezing GPU performance GPGPU 2015: High Performance Computing with CUDA University of Cape Town

GPU peak performance vs. CPU Squeezing GPU performance Peak Double Precision FLOPS Peak Memory

Social Prescribing in Down district County Down Rural Community Network Umbrella body for

1 Push-down Automata A push-down automaton is a finite automaton with an additional last-in

Squeezing the limit: Quantum benchmarks for the teleportation and storage of squeezed states NJP,

Stretching and squeezing time Nick Stroustrup Winston Anthony Walter Fontana Javier Apfeld

PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer Anbang Yao

Symplectic non-squeezing for the discrete nonlinear Schr odinger equation Alexander Tumanov

Squeezing State Spaces of (Attack-Defence) Trees l Knapik 1 Wojciech Penczek 1 Micha Laure

Squeezing a key through a carry bit Sean Devlin, Filippo Valsorda One month later a = a - b

Leakage Squeezing Revisited Vincent Grosso 1 , Fran cois-Xavier Standaert 1 , Emmanuel Prouff 2 .

Ti Time me Squeezing for Tiny Device ces DAC 2018, ISCA 2019

Stereo CSE 576 Ali Farhadi Several slides from

Depth Perception in Grasshopper -Shashank Chepurwar -Ritvik Srivastava Grasshopper -Agile

Using Geometry to Detect Grasp Poses in 3D Point Clouds ten Pas, Platt Northeastern University

Hawkular Metrics Metric Storage &amp; Alerting Stefan Negrea About Me Co-Creator of Hawkular

3D Deep Learning on Geometric Forms Hao Su Many 3D representations are available Candidates:

Recent Trends in 3D Computer Vision and Deep Learning Introductory meeting Winter Semester

Initial word... Robogames Initial word... Robogames Initial word... Robogames The &quot;CS

Results from the Telescope Array Experiment Charlie Jui University of Utah TeVPA 2013 Irvine,

Hawkular Metrics Metric Storage & Alerting Stefan Negrea About Me Co-Creator of Hawkular

Initial word... Robogames Initial word... Robogames Initial word... Robogames The "CS