CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio , P. - PowerPoint PPT Presentation

CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio , P. Judd, T. Hetherington*, T. Aamodt*, N. E. Jerger, A. Moshovos * Please cite the original source.

CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio P. Judd T. Hetherington* T. Aamodt* N. Enright Jerger A. Moshovos *

⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ DNNs = SIMD Heaven x + x 100’s — 1000's 3

⋯ ⋯ ⋯ ⋯ ⋯ DNNs = SIMD Heaven x + x 100’s — 1000's 7

CNVLUTIN: Smarter SIMD 52% Performance — 2x ED 2 P Out-of-the-box networks 8

Outline 1. What’s a CNN? 2. A wide SIMD design 3. CNVLUTIN: Skipping neurons in a wide SIMD design 4. Evaluation 5. Our approach 9

What’s a CNN? Korean … mask! 10’s of layers 10

What’s a CNN? … 11

What’s a CNN? Neurons (Input) … 11

What’s a CNN? Synapses Neurons (Filters) (Input) … … 11

What’s a CNN? … … 12

What’s a CNN? Neurons (Output) … … 12

What’s a CNN? Neurons (Output) … … … 12

What’s a CNN? Korean … mask! 10’s of layers 13

What’s a CNN? Convolution ReLU Pool Korean … mask! 10’s of layers 13

What’s a CNN? CNN typical layer Convolution ReLU Pool Data size Negatives to 0 Inner products 3 reduction 2 x 1 … + 0 x -1 -2 -3 -3 -2 -1 0 1 2 3 14

~90% Time spent in convolutions 15

Lots of Runtime Zeroes 0.6 0.5 0.4 0.3 0.2 0.1 0 Alexnet Google NiN VGG19 VGG_M VGG_S AVG Fraction of zero neurons in multiplications 16

Lots of Runtime Zeroes 0.6 0.5 0.4 0.3 Waste of time and energy!!! 0.2 0.1 0 Alexnet Google NiN VGG19 VGG_M VGG_S AVG Fraction of zero neurons in multiplications 16

Lots of Runtime Zeroes 0.6 0.5 0.4 0.3 Waste of time Dynamically and energy!!! generated 0.2 = 0.1 Not predictable 0 Alexnet Google NiN VGG19 VGG_M VGG_S AVG Fraction of zero neurons in multiplications 16

How to compute DNNs: DaDianNao* NBin Neuron 16 Lane 0 Neuron Lane 15 SB (eDRAM) IP0 x Neurons + f NBout x Filter 0 Filter 0 IP15 x + f Filter 15 x Filter 15 *Chen et al. MICRO 2014

⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Processing in DaDianNao 0 1 1 2 0 Neuron 1 2 1 0 3 Lanes 15 0 1 1 1 0 Synapse 1 Lanes Filter 0 15 0 Synapse 1 Lanes Filter 15 15 18

⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Processing in DaDianNao 0 1 1 2 0 Neuron 1 3 2 1 0 Lanes 1 15 0 1 1 0 Synapse 1 Lanes Filter 0 15 0 Synapse 1 Lanes Filter 15 15 18

⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Processing in DaDianNao 0 1 1 2 0 Neuron 1 3 2 1 0 Lanes 1 15 0 1 1 X 0 Synapse Multiplication of corresponding 1 Lanes neuron and synapse elements Filter 0 15 0 X Synapse 1 Lanes Filter 15 15 18

⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Zero-skipping in DaDianNao? 0 2 1 1 2 0 Neuron 3 1 2 1 0 3 Lanes 1 15 0 1 1 1 0 Synapse 1 Lanes Filter 0 15 0 Synapse 1 Lanes Filter 15 15 19

⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Zero-skipping in DaDianNao? Zero 0 1 1 2 1 1 2 0 removal Neuron 2 1 3 1 2 1 0 3 Lanes 1 1 1 15 0 1 1 1 0 Synapse 1 Lanes Filter 0 15 0 Synapse 1 Lanes Filter 15 15 19

⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Zero-skipping in DaDianNao? Zero 0 1 1 2 1 1 2 0 removal Neuron 2 1 3 1 2 1 0 3 Lanes 1 1 1 15 0 1 1 1 0 X Synapse 1 Lanes Filter 0 15 X 0 Synapse 1 Lanes Filter 15 15 19

⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Zero-skipping in DaDianNao? Zero 0 1 1 2 1 1 2 0 removal Neuron 2 1 3 1 2 1 0 3 Lanes 1 1 1 15 0 1 1 1 0 X Synapse Lanes can 1 Lanes not longer Filter 0 15 operate in lock-step! X 0 Synapse 1 Lanes Filter 15 15 19

⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ CNVLUTIN: Decoupling Lanes Subunit 0 0 Neuron Lane 0 Neuron 1 Lanes Filter 0 Synapses Filter 1 15 Lane 0 Filter 15 0 Synapse 1 Lanes Filter 0 Subunit 15 15 Neuron Lane 15 Filter 0 0 Synapse Synapses Filter 1 1 Lanes Lane 15 Filter 15 Filter 15 15 CNVLUTIN DaDianNao 20

⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ CNVLUTIN: Decoupling Lanes Subunit 0 Neuron Lane 0 1 1 2 0 Offsets 3 2 1 Filter 0 Synapses Lane 0 Filter 15 Subunit 15 Neuron Lane 15 0 1 1 1 Offsets 2 1 0 Filter 0 Synapses Lane 15 Filter 15 21

⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ CNVLUTIN: Decoupling Lanes Subunit 0 Neuron Lane 0 1 1 2 Offsets 3 2 1 Filter 0 Synapses Lane 0 Filter 15 Subunit 15 Neuron Lane 15 1 1 1 Offsets 2 1 0 Filter 0 Synapses Lane 15 Filter 15 21

⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ CNVLUTIN: Decoupling Lanes Subunit 0 Neuron Lane 0 1 1 2 Offsets 3 2 1 Filter 0 X Synapses Lane 0 Filter 15 Subunit 15 Neuron Lane 15 1 1 1 Offsets 2 1 0 X Filter 0 Synapses Lane 15 Filter 15 21

CNVLUTIN: Ineffectual-neuron Filtering Layer i Layer i+1 22

CNVLUTIN: Ineffectual-neuron Filtering Layer i Layer i+1 Dispatcher Encoder eDRAM 23

CNVLUTIN: Ineffectual-neuron Filtering Layer i Layer i+1 Dispatcher Encoder eDRAM Brick 2 Brick 1 Brick 0 Neurons 7 6 5 0 0 0 0 0 0 2 1 0 Packed neurons 0 7 6 5 0 0 0 0 0 0 2 1 eDRAM O ff sets 0 3 2 1 0 0 0 0 0 0 2 1 ZF Neurons 7 6 5 0 2 1 Unit Bu ff ers O ff set 3 2 1 0 2 1 23

CNVLUTIN: Computation Slicing … Neuron Lane 15 Neuron Lane 1 Neuron Lane 0 24

Methodology • In-house timing simulator: baseline + CNVLUTIN • Logic + SRAM: Synthesis on 65nm TSMC • eDRAM model: Destiny • DNNs: Trained models from Caffe model zoo 25

Area Only +4.5% in area overhead 26

Speedup: ineffectual = 0 2 1.5 1 Better 0.5 0 Alexnet Google NiN VGG19 VGG_M VGG_S Geo 27

Speedup: ineffectual = 0 2 1.5 1 Better 0.5 0 Alexnet Google NiN VGG19 VGG_M VGG_S Geo 1.37x Performance on average 27

Loosening the Ineffectual Neuron Criterion CNVLUTIN zero “If all you have is a hammer, everything looks like a nail” (Maslow’s hammer) 37 0 13 10 15 1 123 0 0 7 1 3 0 1 20 0 18 31 0 33 28

Loosening the Ineffectual Neuron Criterion CNVLUTIN zero “If all you have is a hammer, everything looks like a nail” (Maslow’s hammer) 37 0 13 10 15 1 123 0 0 7 1 3 0 1 20 0 18 31 0 33 Example: consider ineffectual if value<2 29

Speedup: ineffectual >= 0 2 1.5 1 Better 0.5 0 Alexnet Google NiN VGG19 VGG_M VGG_S Geo only 0's 0's and more 1.52x Performance No accuracy lost 30

CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio , P. - PowerPoint PPT Presentation

CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio , P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, A. Moshovos * Please cite the original source. CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio P. Judd T.

DNN-based Branch-and-bound for the Quadratic Assignment Problem *Koichi Fujii, Naoki Ito, Yuji

STEP 5. Slides of a neuron, nerve, and spinal cord #1 Neuron slide This step can wait until

Python + NEURON Interpreter HOC Section Neuron specific syntax Range Variable Mechanism

The Dark Side of DNN Pruning Reza Yazdani Marc Riera Jose-Maria Arnau Antonio Gonzlez

Myra Dioquino Ne Neuron Ne Neuron ons ons ~30x out ~30x in 20 000 20,000x out o t ~30x

CHAPTER I CHAPTER I From Biological From Biological to Artificial Neuron Model to Artificial

Neuron Technologies Company Profile Dr. Ismayil Alakbarov Founder & CEO, Neuron Technologies

Biomedicine Enrico Grisan enrico.grisan@dei.unipd.it Neuron basics Neuron: real and simulated

Theoretical neuroscience: From single neuron to network dynamics Nicolas Brunel Outline

Dataflow Network Programming with Neuron Eric Griffis RacketCon 2018 St. Louis, MO 1 I. Meet

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training Hongyu Zhu 1,2 ,

Power-Driven DNN Dataflow Optimization on FPGA Qi Sun 1 , Tinghuan Chen 1 , Jin Miao 2 , Bei Yu 1 1

Outlier Channel Splitting Improving DNN Quantization without Retraining Ritchie Zhao , Yuwei Hu,

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity 2020/11

Contents Introduction Pipelined FPGA DNN accelerators Roof-line Model and optimizing

On the Hilbert function of one-dimensional semigroup rings Michela Di Marca Joint work with

How To Give a good Technical Talk Bertrand Meyer , ETH Zrich & ITMO Welcome to my talk !

New Researcher Symposium: How to Fail L. Haas Failure is

I-70 East Bound Peak Period Shoulder Lane April 23, 2019 Interstate 70-West of Denver Heavy

GPU-Based Large-Scale Scientific Visualization Johanna Beyer, Harvard University Markus

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Agenda P2E Open Office Hours Introduction & Reminders Types of E-Resources

CS 15-212 Iliano Cervesato Principles of Programming Email: iliano+212@qatar.cmu.edu

CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio , P. - PowerPoint PPT Presentation

CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio , P. Judd, T. Hetherington*, T. Aamodt*, N. E. Jerger, A. Moshovos * Please cite the original source. CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio P. Judd T.

DNN-based Branch-and-bound for the Quadratic Assignment Problem *Koichi Fujii, Naoki Ito, Yuji

STEP 5. Slides of a neuron, nerve, and spinal cord #1 Neuron slide This step can wait until

Python + NEURON Interpreter HOC Section Neuron specific syntax Range Variable Mechanism

The Dark Side of DNN Pruning Reza Yazdani Marc Riera Jose-Maria Arnau Antonio Gonzlez

Myra Dioquino Ne Neuron Ne Neuron ons ons ~30x out ~30x in 20 000 20,000x out o t ~30x

CHAPTER I CHAPTER I From Biological From Biological to Artificial Neuron Model to Artificial

Neuron Technologies Company Profile Dr. Ismayil Alakbarov Founder &amp; CEO, Neuron Technologies

Biomedicine Enrico Grisan enrico.grisan@dei.unipd.it Neuron basics Neuron: real and simulated

Theoretical neuroscience: From single neuron to network dynamics Nicolas Brunel Outline

Dataflow Network Programming with Neuron Eric Griffis RacketCon 2018 St. Louis, MO 1 I. Meet

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training Hongyu Zhu 1,2 ,

Power-Driven DNN Dataflow Optimization on FPGA Qi Sun 1 , Tinghuan Chen 1 , Jin Miao 2 , Bei Yu 1 1

Outlier Channel Splitting Improving DNN Quantization without Retraining Ritchie Zhao , Yuwei Hu,

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity 2020/11

Contents Introduction Pipelined FPGA DNN accelerators Roof-line Model and optimizing

On the Hilbert function of one-dimensional semigroup rings Michela Di Marca Joint work with

How To Give a good Technical Talk Bertrand Meyer , ETH Zrich &amp; ITMO Welcome to my talk !

New Researcher Symposium: How to Fail L. Haas Failure is

I-70 East Bound Peak Period Shoulder Lane April 23, 2019 Interstate 70-West of Denver Heavy

GPU-Based Large-Scale Scientific Visualization Johanna Beyer, Harvard University Markus

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Agenda P2E Open Office Hours Introduction &amp; Reminders Types of E-Resources

CS 15-212 Iliano Cervesato Principles of Programming Email: iliano+212@qatar.cmu.edu

CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio , P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, A. Moshovos * Please cite the original source. CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio P. Judd T.

Neuron Technologies Company Profile Dr. Ismayil Alakbarov Founder & CEO, Neuron Technologies

How To Give a good Technical Talk Bertrand Meyer , ETH Zrich & ITMO Welcome to my talk !

Agenda P2E Open Office Hours Introduction & Reminders Types of E-Resources