Improving Efficiency in Neural Network Accelerator using Operands - PowerPoint PPT Presentation

Improving Efficiency in Neural Network Accelerator using Operands Hamming Distance Optimization Meng Li*, YiLei Li*, Pi Pierce ce Chuang , Liangzhen Lai, and Vikas Chandra EMC2 Workshop @ NeurIPS 2019 Facebook Silicon AI Research

Motivation Dataflow processing is widely exploited to amortize memory access energy Datapath energy becomes important for dataflow accelerators Consist of compute energy in process elements (PEs) and data propagation energy among PEs • … weight PE Array Act Act Act Act C Misc weight Buffer C Misc … C weight PE Array weight weight PE Buffer Array H x W Datapath weight … K 57.7% 87.3% K weight … Psum Psum Psum Psum K H x W weight Input Stationary Output Stationary Thinker [Yin+, JSSC’18] ShiDianNao [Du+, ISCA’15] 2

Motivation In dataflow processing, operands are streamed into the compute array Datapath energy is determined by the total bit flips induced by operand streaming Ta Targe get : propose post-training and training-aware techniques to reduce bit flips of weight streaming C H x W W[0, 0] W[0, 1] W[0, 2] W[0, 3] A[0, 0] A[1, 0] A[2, 0] A[3, 0] 600 weight PE Array A[0, 1] A[1, 1] A[2, 1] A[3, 1] W[1, 0] W[1, 1] W[1, 2] W[1, 3] x K C 500 weight Normalized Energy A[0, 2] A[1, 2] A[2, 2] A[3, 2] W[2, 0] W[2, 1] W[2, 2] W[2, 3] 400 A[0, 3] A[1, 3] A[2, 3] A[3, 3] W[3, 0] W[3, 1] W[3, 2] W[3, 3] … C 300 weight K H x W 200 H x W weight W[3, 0] W[2, 0] W[1, 0] W[0, 0] A[0, 0] A[1, 0] A[2, 0] A[3, 0] 100 K W[3, 1] W[2, 1] W[1, 1] W[0, 1] A[0, 1] A[1, 1] A[2, 1] A[3, 1] 0 … C Psum Psum Psum Psum 0.E+00 1.E+05 2.E+05 3.E+05 4.E+05 K W[3, 2] W[2, 2] W[1, 2] W[0, 2] A[0, 2] A[1, 2] A[2, 2] A[2, 3] Total Bit Flips W[3, 3] W[2, 3] W[1, 3] W[0, 3] A[0, 3] A[1, 3] A[3, 2] A[3, 3] K, C, H, W denotes output channel, input channel, output height, and output width, respectively 3

Post-Training Optimization: Output Channel Reordering To reduce bit flips, the most straight-forward technique is output channel reordering • Output channel reordering can be mapped to a traveling salesman problem, which can be approximately solved with efficient greedy algorithms K H x W C C W[3, 0] W[2, 0] W[1, 0] W[0, 0] A[0, 0] A[1, 0] A[2, 0] A[3, 0] 00 10 10 01 10 10 00 01 W[3, 1] W[2, 1] W[1, 1] W[0, 1] A[0, 1] A[1, 1] A[2, 1] A[3, 1] 01 01 10 11 K 00 10 10 01 C K Reorder 10 10 00 01 01 01 10 11 W[3, 2] W[2, 2] W[1, 2] W[0, 2] A[0, 2] A[1, 2] A[2, 2] A[2, 3] 00 01 11 11 00 01 11 11 W[3, 3] W[2, 3] W[1, 3] W[0, 3] A[0, 3] A[1, 3] A[3, 2] A[3, 3] 4

Post-Training Optimization: Input Channel Clustering For most networks, the channel dimension can be larger than the compute array size Weight matrices need to be segmented first and then fed into compute array • Each weight sub-matrix can use different output channel orders • Before segmenting the weight matrix, different input channels can be clustered first Propose an iterative assignment and update approach for input clustering K K K K Reordering 11 11 11 00 11 11 11 00 C cluster 1 C cluster 1 11 11 11 00 11 00 00 00 11 00 00 00 11 00 11 11 10 10 10 01 10 10 10 01 C Clustering 11 00 00 00 10 10 01 01 10 10 01 01 11 11 00 11 C 10 10 10 01 K Reordering 11 00 00 00 11 00 00 00 10 01 10 10 C cluster 2 C cluster 2 11 11 00 11 11 11 00 11 10 10 01 01 10 01 10 10 10 01 10 10 10 10 01 10 10 01 10 10 10 01 10 10 5

Experimental Results Post-training optimization technique comparison • Use 1x1 Conv in MobileNetV2 and 3x3 Conv in ResNet26 for evaluation Combine post-training and training-aware optimization • Incorporate bit flip loss into the loss function • Use MobileNetV2 trained on Cifar100 for evaluation 2 4 Baseline Average HD Reduction MobileNetV2 3.5 1 . 8 Direct Reorder Cluster-then-Reorder 3 1 . 6 Reduction 2.5 ResNet26 2 1 . 4 1.5 1 1 . 2 0.5 1 0 HD Reduction Energy Reduction 0 . 8 8 16 32 64 8 16 32 64 Baseline Post-Training Training-Aware Combine Channels/Cluster 6

Improving Efficiency in Neural Network Accelerator using Operands - PowerPoint PPT Presentation

Improving Efficiency in Neural Network Accelerator using Operands Hamming Distance Optimization Meng Li, YiLei Li, Pi Pierce ce Chuang , Liangzhen Lai, and Vikas Chandra EMC2 Workshop @ NeurIPS 2019 Facebook Silicon AI Research Motivation

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

1 3 5 CONVENTIONAL DC MODEL Accelerator Output Accelerator Opening FB-CA SERIES Accelerator

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

CEBAF Accelerator Status Arne Freyberger Operations Department Accelerator Division Jefferson

SLAC Accelerator Science and R&D R. Hettel Accelerator Research Division Head (acting)

Fermilab Accelerator R&D Program Vladimir Shiltsev, Accelerator Physics Center Institutional

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Improving Algorithmic Efficiency 15-112 Big Ideas Efficiency in Algorithms Now that we know

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Neural Network II Neural Network II Week 8 1 Team Homework Assignment #10 Team Homework

Improving Improving Finances, Finances, Improving Improving Lives Lives www.jeanchatzky.com

Challenges in Accelerator Applications Shukui Zhang Thomas Jefferson National Accelerator

FOA Landscape Manouchehr Farkhondeh DOE Office of Nuclear Physics EIC Accelerator Collaboration

KEK, High Energy Accelerator Research Organization KEK High Energy Accelerator Research

Eric Prebys FNAL Accelerator Physics Center 8/17/10 Im the head of the US LHC Accelerator

US LHC Accelerator Research Program HL-LHC BNL - FNAL- LBNL - SLAC LARP Accelerator Systems 17

BONSUCRO COMPLAINTS and GRIEVANCES MANAGEMENT SYSTEM Webinar August 2020 INTRO On Monday 15th

Dustin Gautney Veteran Programs Manager 12+ Years of military service (USMC & Army)

Presentation for Furniture Export Accelerator Workshop organized by Warsaw, Poland (5.12.2018)

Engineering aspects of a 2MeV Electrostatic Van de Graaff Electron Accelerator. By: Ramiro G.

Accelerating Energy Efficiency Delivering Global Energy Efficiency Goals and the offer of

C-Brain: A Deep Learning Accelerator that Tames the Diversity of CNNs through Adaptive Data-level

Accession Preparation: Situation in Croatia Sinia Tomi , PhD, Associate Professor Head of

South East European Centre for Entrepreneurial Learning What is SEECEL? First institutional

Improving Efficiency in Neural Network Accelerator using Operands - PowerPoint PPT Presentation

Improving Efficiency in Neural Network Accelerator using Operands Hamming Distance Optimization Meng Li*, YiLei Li*, Pi Pierce ce Chuang , Liangzhen Lai, and Vikas Chandra EMC2 Workshop @ NeurIPS 2019 Facebook Silicon AI Research Motivation

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

1 3 5 CONVENTIONAL DC MODEL Accelerator Output Accelerator Opening FB-CA SERIES Accelerator

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

CEBAF Accelerator Status Arne Freyberger Operations Department Accelerator Division Jefferson

SLAC Accelerator Science and R&amp;D R. Hettel Accelerator Research Division Head (acting)

Fermilab Accelerator R&amp;D Program Vladimir Shiltsev, Accelerator Physics Center Institutional

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Improving Algorithmic Efficiency 15-112 Big Ideas Efficiency in Algorithms Now that we know

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Neural Network II Neural Network II Week 8 1 Team Homework Assignment #10 Team Homework

Improving Improving Finances, Finances, Improving Improving Lives Lives www.jeanchatzky.com

Challenges in Accelerator Applications Shukui Zhang Thomas Jefferson National Accelerator

FOA Landscape Manouchehr Farkhondeh DOE Office of Nuclear Physics EIC Accelerator Collaboration

KEK, High Energy Accelerator Research Organization KEK High Energy Accelerator Research

Eric Prebys FNAL Accelerator Physics Center 8/17/10 Im the head of the US LHC Accelerator

US LHC Accelerator Research Program HL-LHC BNL - FNAL- LBNL - SLAC LARP Accelerator Systems 17

BONSUCRO COMPLAINTS and GRIEVANCES MANAGEMENT SYSTEM Webinar August 2020 INTRO On Monday 15th

Dustin Gautney Veteran Programs Manager 12+ Years of military service (USMC &amp; Army)

Presentation for Furniture Export Accelerator Workshop organized by Warsaw, Poland (5.12.2018)

Engineering aspects of a 2MeV Electrostatic Van de Graaff Electron Accelerator. By: Ramiro G.

Accelerating Energy Efficiency Delivering Global Energy Efficiency Goals and the offer of

C-Brain: A Deep Learning Accelerator that Tames the Diversity of CNNs through Adaptive Data-level

Accession Preparation: Situation in Croatia Sinia Tomi , PhD, Associate Professor Head of

South East European Centre for Entrepreneurial Learning What is SEECEL? First institutional

Improving Efficiency in Neural Network Accelerator using Operands Hamming Distance Optimization Meng Li, YiLei Li, Pi Pierce ce Chuang , Liangzhen Lai, and Vikas Chandra EMC2 Workshop @ NeurIPS 2019 Facebook Silicon AI Research Motivation

SLAC Accelerator Science and R&D R. Hettel Accelerator Research Division Head (acting)

Fermilab Accelerator R&D Program Vladimir Shiltsev, Accelerator Physics Center Institutional

Dustin Gautney Veteran Programs Manager 12+ Years of military service (USMC & Army)