SCALEDEEP Bryce Paputa and Royce Hwang Motivation: DNN Applications - PowerPoint PPT Presentation

SCALEDEEP Bryce Paputa and Royce Hwang

Motivation: DNN Applications • Google image search, Apple Siri • Self-driving cars, Education, Healthcare Source: https://deepmind.com/ Source: http://fortune.com/2017/03/27/waymo-self-driving-minivans-snow/ Source: https://www.verizonwireless.com/od/smartphones/apple-iphone-x/ 2

Simple Neural Network Source: https://www.dtreg.com/solution/view/22 3

3 Stages of Training - Forward propagation: Evaluates the network. - Back propagation: Calculates the error and propagates it from the output stages to the input stages - Weight gradient and update: Calculates the gradient of the error and updates the weights to reduce the error 4

From Simple to Deep NN Source: https://hackernoon.com/log-analytics-with-deep-learning-and-machine-learning-20a1891ff70e 5

Convolutional Neural Network Source: http://cs231n.github.io/convolutional-networks/ 6

Implemention Challenges • Training and inference steps are extremely computational and data intensive • Example: Overfeat DNN – 820K Neurons, 145M parameters ~3.3 x 10 9 operations for a single 231 x 231 image • To process ImageNet dataset (1.2 million images), it needs ~15 x 10 15 operations for a single training iteration 7

Escalating Computational Requirements Unless otherwise noted, all figures are from SCALEDEEP: A Scalable Compute Architecture for Learning and Evaluating Deep Networks, Venkataramani et al, ISCA 2017. 8

Ways to Speed This Up 9

System Architecture 10

Convolutional DNN Source: http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/ 11

3 Main Layers: Convolution • Convolution (CONV) Layer – Takes in inputs and applies convolution operation with weights – Outputs values (features) to the next layers. – Computationally intensive 12

3 Main Layers: Sampling • Sampling (SAMP) Layer – Also known as pooling layer – Performs up/down sampling on features – Example: Decreasing image resolution – Data Intensive 13

3 Main Layers: Fully Connected • Fully Connected (FC) Layer – Composes features in the CONV layers into output (classification, etc.) – Data Intensive 14

Computation Heavy Layers Initial CONV layers - Fewer, but larger features - 16% of Flops - Very high reuse of weights Middle CONV layers - Smaller features, but more numerous - 80% of Flops 15

Memory Heavy Layers Fully connected layers - Fewer Flops (4%) - No weight reuse Sampling Layers - Even fewer Flops (0.1%) - No training step/weights - Very high Bytes/FLOP 16

Summary of Characteristics 17

CompHeavy Tile - Used for low Byte/FLOP stages - 2D-PE computes the dot product of an input and kernel - Computes many kernels convolved with the input - Statically controlled 18

MemHeavy Tile - Stores features, weights, errors, and error gradients in scratchpad memory. - Special Function Units (SFU) implement activation functions like ReLu, tanh, sigmoid 19

SCALEDEEP Chip 20

Heterogeneous Chips . 21 CONV Layer Chip FC Layer Chip

Node Architecture . - All memory is on chip or directly connected - Wheel configuration allows for high memory bandwidth and for layers to be split between chips - Ring configuration allows for high model parallelism 22

Intra-layer Parallelism . 23

Inter-Layer Parallelism . - Pipeline depth is equal to twice the number of layers using during training - Depth is equal to the number of layers during evaluation 24

Experimental Results - System tested using 7032 Processing Elements - Single precision - 680 TFLOPS - Half precision - 1.35 PFLOPS - 6-28x speedup compared to TitanX GPU 25

Power Usage 26

Hardware (PE) Utilization 1. The granularity that they can allocate PE at is higher than ideal: a. Layer distribution to columns b. Feature distribution to MemHeavy Tiles c. Feature sizes are not a multiple of 2D-PE rows 2. Control logic and data transfer also lower utilization Total utilization is 35% 27

Key Features of SCALEDEEP • Heterogeneous processing and compute chips • System design matches the structure of memory access of DNNs • Nested pipelining to minimize data movement and improve core utilization 28

Discussion • Since DNN design is still more of an art than science at this point, does it make sense to make an ASIC, given the high cost of developing hardware? • How does ScaleDeep compare to other systems like Google’s TPU and TABLA? In what situations is it better and worse? • What are some pitfalls of this design? 29

SCALEDEEP Bryce Paputa and Royce Hwang Motivation: DNN Applications - PowerPoint PPT Presentation

SCALEDEEP Bryce Paputa and Royce Hwang Motivation: DNN Applications Google image search, Apple Siri Self-driving cars, Education, Healthcare Source: https://deepmind.com/ Source:

Oregon NFIP Biological Opinion Implementation Planning Partner Webinar March 20, 2020 WELCOME!

Thank you for joining us! All phone lines are muted; please type in your questions into the

Introduction Mitchell, Chapter 1 CptS 570 Machine Learning School of EECS Washington State

IEA EBC Annex 60 New Generation Computational Tools for Building and Community Energy

Heterogeneous Lexical Resources MultiJEDI ERC 259234 Lexical Resource Lexical Resource Lexical

sequencing data https://github.com/DRL/blobtools thanks to Sujai Kumar, Dominik Laetsch (Blaxter

KamLAND Koji Ishidoshiro (Tohoku University) for KamLAND collaboration The 14th International

The Planet-Metallicity Correlation for Hot Jupiters Daniel Bayliss University of Geneva The

How to form asteroids from mm-sized grains D. Carrera 1 , A. Johansen 1 , M. B. Davies 1 Talk

Biocode Field Informa0on Management System (FIMS) John Deck, UC

Diatoms Thalassiosira gravida Fragilariopsis kerguelensis Coccolithophorids Dinoflagelates

Exercise 12: Adding Color Information Goal : Add color information to bitstream Y Cb Cb

The raster package Working with Geospatial Data in R Data frames arent a great way to store

Genetics: Study Of The Mechanisms Of Heredity Mendelian Genetics proposed in mid-1800s by

Chromatographic Theory References: Skoog, Principles of Instrumental Analysis 1985 (3

Using Geometric Singular Perturbation Theory to Understand Singular Shocks Barbara Lee Keyfitz

Strengthening Ontarios Innovation System: The Role of Ontarios Innovation Agenda ISRN

Refresher on x ts and the plot () f u nction Arna u d Amsellem The R Trader VISU AL IZIN G TIME

Introduction to Android Development What is Android? Android is the customizable, easy to

HYPOTHESIS TESTING PART I RECAP & OUTLOOK BAYESIAN PARAMETER ESTIMATION FREQUENTIST

Chromosome tracing with OligoFISSEQ Marc A. Marti-Renom CNAG-CRG ICREA Huy Nguyen

Class exercise Single-nucleotide polymorphism A single-nucleotide polymorphism (SNP,

BootcampR AN INTRODUCTION TO R Jason A. Heppler, PhD University of Nebraska at Omaha March 17,

HOW TO BECOME AN EFFECTIVE GROUP FACILITATOR How do I prepare? Know your Know your Know your

SCALEDEEP Bryce Paputa and Royce Hwang Motivation: DNN Applications - PowerPoint PPT Presentation

SCALEDEEP Bryce Paputa and Royce Hwang Motivation: DNN Applications Google image search, Apple Siri Self-driving cars, Education, Healthcare Source: https://deepmind.com/ Source:

Oregon NFIP Biological Opinion Implementation Planning Partner Webinar March 20, 2020 WELCOME!

Thank you for joining us! All phone lines are muted; please type in your questions into the

Introduction Mitchell, Chapter 1 CptS 570 Machine Learning School of EECS Washington State

IEA EBC Annex 60 New Generation Computational Tools for Building and Community Energy

Heterogeneous Lexical Resources MultiJEDI ERC 259234 Lexical Resource Lexical Resource Lexical

sequencing data https://github.com/DRL/blobtools thanks to Sujai Kumar, Dominik Laetsch (Blaxter

KamLAND Koji Ishidoshiro (Tohoku University) for KamLAND collaboration The 14th International

The Planet-Metallicity Correlation for Hot Jupiters Daniel Bayliss University of Geneva The

How to form asteroids from mm-sized grains D. Carrera 1 , A. Johansen 1 , M. B. Davies 1 Talk

Biocode Field Informa0on Management System (FIMS) John Deck, UC

Diatoms Thalassiosira gravida Fragilariopsis kerguelensis Coccolithophorids Dinoflagelates

Exercise 12: Adding Color Information Goal : Add color information to bitstream Y Cb Cb

The raster package Working with Geospatial Data in R Data frames arent a great way to store

Genetics: Study Of The Mechanisms Of Heredity Mendelian Genetics proposed in mid-1800s by

Chromatographic Theory References: Skoog, Principles of Instrumental Analysis 1985 (3

Using Geometric Singular Perturbation Theory to Understand Singular Shocks Barbara Lee Keyfitz

Strengthening Ontarios Innovation System: The Role of Ontarios Innovation Agenda ISRN

Refresher on x ts and the plot () f u nction Arna u d Amsellem The R Trader VISU AL IZIN G TIME

Introduction to Android Development What is Android? Android is the customizable, easy to

HYPOTHESIS TESTING PART I RECAP &amp; OUTLOOK BAYESIAN PARAMETER ESTIMATION FREQUENTIST

Chromosome tracing with OligoFISSEQ Marc A. Marti-Renom CNAG-CRG ICREA Huy Nguyen

Class exercise Single-nucleotide polymorphism A single-nucleotide polymorphism (SNP,

BootcampR AN INTRODUCTION TO R Jason A. Heppler, PhD University of Nebraska at Omaha March 17,

HOW TO BECOME AN EFFECTIVE GROUP FACILITATOR How do I prepare? Know your Know your Know your

HYPOTHESIS TESTING PART I RECAP & OUTLOOK BAYESIAN PARAMETER ESTIMATION FREQUENTIST