Reduce Number of Ops and Weights Exploit Activation Statistics - PowerPoint PPT Presentation

Reduce Number of Ops and Weights • Exploit Activation Statistics • Network Pruning • Compact Network Architectures • Knowledge Distillation 26

Sparsity in Fmaps Many zeros in output fmaps after ReLU ReLU 9 -1 -3 9 0 0 1 -5 5 1 0 5 -2 6 -1 0 6 0 # of activations # of non-zero activations 1 0.8 0.6 (Normalized) 0.4 0.2 0 1 2 3 4 5 CONV Layer 27

I/O Compression in Eyeriss DCNN Accelerator Link Clock Core Clock 14 × 12 PE Array Filter Filt … Run-Length Compression (RLC) Img Buffer Input Image … Example: SRAM Decomp Psum Input : 0 , 0 , 12, 0 , 0 , 0 , 0 , 53, 0 , 0 , 22, … … 108KB Output Image Term Run Level Run Level Run Level Output ( 64b ): Psum … … 2 12 4 53 2 22 0 Comp ReLU 5b 16b 5b 16b 5b 16b 1b … Off-Chip DRAM 64 bits [Chen et al., ISSCC 2016] 28

Compression Reduces DRAM BW 1.2 × 6 6 1.4 × 1.7 × 5 DRAM 1.8 × 4 Uncompressed 4 1.9 × Access Fmaps + Weights 3 (MB) 2 2 1 RLE Compressed 0 0 Fmaps + Weights 1 2 3 4 5 1 2 3 4 5 AlexNet Conv Layer Simple RLC within 5% - 10% of theoretical entropy limit [Chen et al., ISSCC 2016] 29

Data Ga&ng / Zero Skipping in Eyeriss Skip MAC and mem reads when image data is zero. Image Reduce PE power by 45% Img Scratch Pad (12x16b REG) 2-stage Zero Accumulate == 0 pipelined Enable Buffer Input Psum multiplier Output Filt Filter 0 Psum Scratch Pad (225x16b SRAM) 1 Input Psum 0 1 Partial Sum 0 Scratch Pad (24x16b REG) Reset [Chen et al., ISSCC 2016] 30

Cnvlutin • Process Convolution Layers • Built on top of DaDianNao (4.49% area overhead) • Speed up of 1.37x (1.52x with activation pruning) [Albericio et al., ISCA 2016] 31

Pruning Activations Remove small activation values Speed up 11% (ImageNet) Reduce power 2x (MNIST) Minerva Cnvlutin [Albericio et al., ISCA 2016] [Reagen et al., ISCA 2016] 32

Pruning – Make Weights Sparse • Optimal Brain Damage 1. Choose a reasonable network architecture 2. Train network until reasonable solution obtained 3. Compute the second derivative retraining for each weight 4. Compute saliencies (i.e. impact on training error) for each weight 5. Sort weights by saliency and delete low-saliency weights 6. Iterate to step 2 [Lecun et al., NIPS 1989] 33

Pruning – Make Weights Sparse Prune based on magnitude of weights �� Example: AlexNet Weight Reduction: CONV layers 2.7x, FC layers 9.9x (Most reduction on fully connected layers) Overall: 9x weight reduction, 3x MAC reduction [Han et al., NIPS 2015] 34

Speed up of Weight Pruning on CPU/GPU On Fully Connected Layers Only Average Speed up of 3.2x on GPU, 3x on CPU, 5x on mGPU Intel Core i7 5930K: MKL CBLAS GEMV, MKL SPBLAS CSRMV NVIDIA GeForce GTX Titan X: cuBLAS GEMV, cuSPARSE CSRMV NVIDIA Tegra K1: cuBLAS GEMV, cuSPARSE CSRMV Batch size = 1 [Han et al., NIPS 2015] 35

Key Metrics for Embedded DNN • Accuracy à Measured on Dataset • Speed à Number of MACs • Storage Footprint à Number of Weights • Energy à ? 36

Energy-Aware Pruning • # of Weights alone is not a good metric for energy – Example (AlexNet): • # of Weights (FC Layer) > # of Weights (CONV layer) • Energy (FC Layer) < Energy (CONV layer) • Use energy evaluation method to estimate DNN energy – Account for data movement [Yang et al., CVPR 2017] 37

Energy-Evaluation Methodology Hardware Energy Costs of each CNN Shape Configuration MAC and Memory Access (# of channels, # of filters, etc.) # acc. at mem. level 1 # acc. at mem. level 2 Memory Accesses … E data Optimization # acc. at mem. level n E comp # of MACs # of MACs Calculation Energy CNN Weights and Input Data [0.3, 0, -0.4, 0.7, 0, 0, 0.1, …] L1 L2 L3 … CNN Energy Consumption Evaluation tool available at http://eyeriss.mit.edu/energy.html 38

Key Observations • Number of weights alone is not a good metric for energy • All data types should be considered Computa&on Input Feature Map 10% 25% Weights Energy Consump&on 22% of GoogLeNet Output Feature Map 43% [Yang et al., CVPR 2017] 39

Energy Consumption of Existing DNNs 93% ResNet-50 91% VGG-16 Top-5 Accuracy 89% GoogLeNet 87% 85% 83% 81% SqueezeNet AlexNet 79% 77% 5E+08 5E+09 5E+10 Normalized Energy Consump&on Original DNN Deeper CNNs with fewer weights do not necessarily consume less energy than shallower CNNs with more weights [Yang et al., CVPR 2017] 40

Magnitude-based Weight Pruning 93% ResNet-50 91% VGG-16 Top-5 Accuracy 89% GoogLeNet 87% 85% 83% SqueezeNet 81% SqueezeNet AlexNet AlexNet 79% 77% 5E+08 5E+09 5E+10 Normalized Energy Consump&on Original DNN Magnitude-based Pruning [6] [Han et al., NIPS 2015] Reduce number of weights by removing small magnitude weights 41

Energy-Aware Pruning 93% ResNet-50 91% VGG-16 Top-5 Accuracy 89% GoogLeNet GoogLeNet 87% 85% 83% SqueezeNet 1.74x 81% SqueezeNet AlexNet AlexNet SqueezeNet AlexNet 79% 77% 5E+08 5E+09 5E+10 Normalized Energy Consump&on Original DNN Magnitude-based Pruning [6] Energy-aware Pruning (This Work) Remove weights from layers in order of highest to lowest energy 3.7x reduction in AlexNet / 1.6x reduction in GoogLeNet DNN Models available at http://eyeriss.mit.edu/energy.html 42

Energy Estimation Tool Website: https://energyestimation.mit.edu/ Input DNN Configuration File Output DNN energy breakdown across layers [Yang et al., CVPR 2017] 43

Compression of Weights & Activations • Compress weights and activations between DRAM and accelerator • Variable Length / Huffman Coding Example: Value: 16’b0 à Compressed Code: { 1’b0 } Value: 16’bx à Compressed Code: { 1’b1 , 16’bx } • Tested on AlexNet à 2 × overall BW Reduction [Moons et al., VLSI 2016; Han et al., ICLR 2016] 44

Sparse Matrix-Vector DSP • Use CSC rather than CSR for SpMxV Compressed Sparse Row (CSR) Compressed Sparse Column (CSC) N M Reduce memory bandwidth (when not M >> N ) For DNN, M = # of filters, N = # of weights per filter [Dorrance et al., FPGA 2014] 45

EIE: A Sparse Linear Algebra Engine • Process Fully Connected Layers (after Deep Compression) • Store weights column-wise in Run Length format • Read relative column when input is non-zero Supports Fully Connected Layers Only Input Dequantize Weight � � ~ a a 1 a 3 0 0 Output ~ b × 0 1 0 1 0 1 PE 0 w 0 , 0 w 0 , 1 0 w 0 , 3 b 0 b 0 PE 1 0 w 1 , 2 0 b 1 b 1 B 0 C B C B C B C B C B C B C B C B C PE 2 0 w 2 , 1 0 w 2 , 3 − b 2 Weights 0 B C B C B C B C B C B C PE 3 0 0 0 0 b 3 b 3 B C B C B C ReLU = B C B C B C ⇒ B C B C B C 0 w 4 , 2 w 4 , 3 − b 4 0 0 B C B C B C Keep track of location B C B C B C w 5 , 0 0 0 0 b 5 b 5 B C B C B C B C B C B C 0 0 0 w 6 , 3 b 6 b 6 B C B C B C Output Stationary Dataflow @ A @ A @ A 0 w 7 , 1 0 − b 7 0 0 [Han et al., ISCA 2016] 46

Sparse CNN (SCNN) Supports Convolutional Layers Densely Packed All-to all Mechanism to Add to Storage of Weights Multiplication of Scattered Partial Sums and Activations Weights and Activations a * x a a * y b x a * z c = y Scatter d b * x network z e y b * f z b * … Accumulate MULs PE frontend PE backend Input Stationary Dataflow [Parashar et al., ISCA 2017] 47

Structured/Coarse-Grained Pruning • Scalpel – Prune to match the underlying data-parallel hardware organization for speed up Example: 2-way SIMD Dense weights Sparse weights [Yu et al., ISCA 2017] 48

Compact Network Architectures • Break large convolutional layers into a series of smaller convolutional layers – Fewer weights, but same effective receptive field • Before Training: Network Architecture Design • After Training: Decompose Trained Filters 49

Network Architecture Design Build Network with series of Small Filters GoogleNet/Inception v3 Apply sequentially 5x5 filter 5x1 filter 1x5 filter decompose separable filters VGG-16 5x5 filter Two 3x3 filters Apply sequentially decompose 50

Network Architecture Design Reduce size and computation with 1x1 Filter ( bottleneck ) Figure Source: Stanford cs231n Used in Network In Network(NiN) and GoogLeNet [Szegedy et al., ArXiV 2014 / CVPR 2015] [Lin et al., ArXiV 2013 / ICLR 2014] 51

Bottleneck in Popular DNN models compress ResNet expand GoogleNet compress 54

Reduce Number of Ops and Weights Exploit Activation Statistics - PowerPoint PPT Presentation

Reduce Number of Ops and Weights Exploit Activation Statistics Network Pruning Compact Network Architectures Knowledge Distillation 26 Sparsity in Fmaps Many zeros in output fmaps after ReLU ReLU 9 -1 -3 9 0 0 1 -5 5 1 0 5 -2 6 -1

CTA WEIGHTS AND CTA WEIGHTS AND DIMENSIONS DIMENSIONS INITIATIVES INITIATIVES Meeting of the

Plane partitions with two-periodic weights Sevak Mkrtchyan University of Rochester GGI June 15,

understanding production through your customers eyes @cyen @honeycombio 2012 DEV OPS

TRICKS AND TRAPS OF THE CO-OPS ACT TRICKS AND TRAPS OF THE CO-OPS ACT THE NAME TRAP a

Ops Area / OpsAWG Joint Ops Area and Operations and Management Area Working Group 1 IETF 95,

OPS Oilfield Equipment and Services Ltd. I N S P E C T I O N S E R V I C E S HISTORY OPS

TLD-OPS Update ccTLD Security and Stability Together ccNSO Members Day June 27, 2017 ICANN59,

NOCOE Peer Exchange Funding/Ops Funding/Ops Innovations: Innovations: P3s P3s Overview

Students for Cooperation A democratic federation of student co-ops organising for social

The New DANTE NO OC A Multiple Domain O A Multiple Domain O Ops Centre Ops Centre Toby

I ndustry Day Briefing 24/ 08/ 2010 OPS-G Engineering Manpower Support Frame Contract (EFC1) H/

BinaryExpr.action Ints, all Ops - a.isInt() && b.isInt() works for a few Ops,

Companies Need to Know Bureau of Weights and Measures E15 Presentation Overview Presenter: Judy

Constructing Inverse Probability Weights for Static Constructing Inverse Probability Weights for

Multiplicative Weights Algorithms CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 13 :

Today. Notes. The multiplicative weights framework. Quick Review: experts

Deep Inelastic Scattering: Recent Results and Future D.Naples University of Pittsburgh NuInt

Longitudinal Vector Boson Scattering with Deep Machine Learning Jake Searcy, Lillian Huang,

ECE590 Computer and Information Security Fall 2019 Introduction and Course Policies Tyler

Learning Objectives At the end of the class you should be able to: justify why depth-bounded

Clouds in the ECMWF GCM Clouds in the ECMWF GCM George Aumann JPL Larrabee Strow, Scott Hannon

Think not lightly of good, saying, "It will not come to me." Drop by drop is the water

Vibration of granular materials D. DUHAMEL UR Navier DES PONTS ET CHAUSSEES ECOLE NATIONALE

Current Issues and Controversies in Nutrition Nutrition Science Dietary Guidelines for

Reduce Number of Ops and Weights Exploit Activation Statistics - PowerPoint PPT Presentation

Reduce Number of Ops and Weights Exploit Activation Statistics Network Pruning Compact Network Architectures Knowledge Distillation 26 Sparsity in Fmaps Many zeros in output fmaps after ReLU ReLU 9 -1 -3 9 0 0 1 -5 5 1 0 5 -2 6 -1

CTA WEIGHTS AND CTA WEIGHTS AND DIMENSIONS DIMENSIONS INITIATIVES INITIATIVES Meeting of the

Plane partitions with two-periodic weights Sevak Mkrtchyan University of Rochester GGI June 15,

understanding production through your customers eyes @cyen @honeycombio 2012 DEV OPS

TRICKS AND TRAPS OF THE CO-OPS ACT TRICKS AND TRAPS OF THE CO-OPS ACT THE NAME TRAP a

Ops Area / OpsAWG Joint Ops Area and Operations and Management Area Working Group 1 IETF 95,

OPS Oilfield Equipment and Services Ltd. I N S P E C T I O N S E R V I C E S HISTORY OPS

TLD-OPS Update ccTLD Security and Stability Together ccNSO Members Day June 27, 2017 ICANN59,

NOCOE Peer Exchange Funding/Ops Funding/Ops Innovations: Innovations: P3s P3s Overview

Students for Cooperation A democratic federation of student co-ops organising for social

The New DANTE NO OC A Multiple Domain O A Multiple Domain O Ops Centre Ops Centre Toby

I ndustry Day Briefing 24/ 08/ 2010 OPS-G Engineering Manpower Support Frame Contract (EFC1) H/

BinaryExpr.action Ints, all Ops - a.isInt() &amp;&amp; b.isInt() works for a few Ops,

Companies Need to Know Bureau of Weights and Measures E15 Presentation Overview Presenter: Judy

Constructing Inverse Probability Weights for Static Constructing Inverse Probability Weights for

Multiplicative Weights Algorithms CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 13 :

Today. Notes. The multiplicative weights framework. Quick Review: experts

Deep Inelastic Scattering: Recent Results and Future D.Naples University of Pittsburgh NuInt

Longitudinal Vector Boson Scattering with Deep Machine Learning Jake Searcy, Lillian Huang,

ECE590 Computer and Information Security Fall 2019 Introduction and Course Policies Tyler

Learning Objectives At the end of the class you should be able to: justify why depth-bounded

Clouds in the ECMWF GCM Clouds in the ECMWF GCM George Aumann JPL Larrabee Strow, Scott Hannon

Think not lightly of good, saying, &quot;It will not come to me.&quot; Drop by drop is the water

Vibration of granular materials D. DUHAMEL UR Navier DES PONTS ET CHAUSSEES ECOLE NATIONALE

Current Issues and Controversies in Nutrition Nutrition Science Dietary Guidelines for

BinaryExpr.action Ints, all Ops - a.isInt() && b.isInt() works for a few Ops,

Think not lightly of good, saying, "It will not come to me." Drop by drop is the water