Facebook Silicon AI Research
Meng Li*, YiLei Li*, Pi Pierce ce Chuang, Liangzhen Lai, and Vikas Chandra EMC2 Workshop @ NeurIPS 2019
Improving Efficiency in Neural Network Accelerator using Operands - - PowerPoint PPT Presentation
Improving Efficiency in Neural Network Accelerator using Operands Hamming Distance Optimization Meng Li*, YiLei Li*, Pi Pierce ce Chuang , Liangzhen Lai, and Vikas Chandra EMC2 Workshop @ NeurIPS 2019 Facebook Silicon AI Research Motivation
Facebook Silicon AI Research
Meng Li*, YiLei Li*, Pi Pierce ce Chuang, Liangzhen Lai, and Vikas Chandra EMC2 Workshop @ NeurIPS 2019
2
… …
H x W C PE Array K Psum K weight weight weight weight Psum Psum Psum
… …
H x W K PE Array C Act C weight weight weight weight Act Act Act PE Array Buffer Misc 57.7%
Thinker [Yin+, JSSC’18] Output Stationary Input Stationary
Datapath Buffer Misc 87.3%
ShiDianNao [Du+, ISCA’15]
3
… …
H x W C PE Array K Psum K weight weight weight weight Psum Psum Psum
W[3, 0] W[2, 0] W[1, 0] W[0, 0]
K C
x
A[3, 0] A[2, 0] A[1, 0] A[0, 0] C H x W H x W C K
W[3, 1] W[2, 1] W[1, 1] W[0, 1] W[3, 2] W[2, 2] W[1, 2] W[0, 2] W[3, 3] W[2, 3] W[1, 3] W[0, 3]
A[3, 1] A[2, 1] A[1, 1] A[0, 1] A[3, 2] A[2, 2] A[1, 2] A[0, 2] A[3, 3] A[2, 3] A[1, 3] A[0, 3] A[3, 0] A[2, 0] A[1, 0] A[0, 0] A[3, 1] A[2, 1] A[1, 1] A[0, 1] A[3, 2] A[2, 2] A[1, 2] A[0, 2] A[3, 3] A[2, 3] A[1, 3] A[0, 3]
W[3, 0] W[2, 0] W[1, 0] W[0, 0] W[3, 1] W[2, 1] W[1, 1] W[0, 1] W[3, 2] W[2, 2] W[1, 2] W[0, 2] W[3, 3] W[2, 3] W[1, 3] W[0, 3]
K, C, H, W denotes output channel, input channel, output height, and output width, respectively
100 200 300 400 500 600 0.E+00 1.E+05 2.E+05 3.E+05 4.E+05
Normalized Energy Total Bit Flips
4
H x W C K A[3, 0] A[2, 0] A[1, 0] A[0, 0] A[3, 1] A[2, 1] A[1, 1] A[0, 1] A[3, 2] A[2, 2] A[1, 2] A[0, 2] A[3, 3] A[2, 3] A[1, 3] A[0, 3]
W[3, 0] W[2, 0] W[1, 0] W[0, 0] W[3, 1] W[2, 1] W[1, 1] W[0, 1] W[3, 2] W[2, 2] W[1, 2] W[0, 2] W[3, 3] W[2, 3] W[1, 3] W[0, 3]
00 10 01 00 01 10 01 10 11 00 10 10 11 01 11 01 K C C 00 10 01 00 01 10 01 10 11 00 10 10 11 01 11 01 K Reorder
5
00 11 11 11 11 11 00 11 00 00 00 11 11 00 11 11 01 10 10 10 10 10 01 10 01 01 10 10 10 01 10 10 K C 00 11 11 11 00 00 00 11 01 10 10 10 01 01 10 10 00 00 00 11 11 00 11 11 10 10 01 10 10 10 01 10 C cluster 1 C cluster 2 K 00 11 11 11 00 00 00 11 01 10 10 10 01 01 10 10 00 00 00 11 11 00 11 11 10 10 01 10 10 10 01 10 K C cluster 1 C cluster 2 C Clustering K Reordering K Reordering
6
8 16 32 64 8 16 32 64 0.8 1 1.2 1.4 1.6 1.8 2 Channels/Cluster Average HD Reduction Baseline Direct Reorder Cluster-then-Reorder
MobileNetV2 ResNet26
0.5 1 1.5 2 2.5 3 3.5 4 HD Reduction Energy Reduction
Reduction
Baseline Post-Training Training-Aware Combine