Don’t Use a Single Large Systolic Array, Use Many Small Ones Instead
- H. T. Kung
Dont Use a Single Large Systolic Array, Use Many Small Ones Instead - - PowerPoint PPT Presentation
Dont Use a Single Large Systolic Array, Use Many Small Ones Instead H. T. Kung Harvard University Presentation at Workshop on ML for Systems at ISCA, Phoenix, AZ, USA June 23, 2019 Outline Background: CNN, matmul, systolic arrays
2
Miriam Cha (recently graduated; now a visiting scholar) Marcus Comiter Xin Dong Youngjune Gwon (graduated; now a visiting scholar)
(recently graduated; now a postdoc)
Surat Teerapittayanon (recently graduated) James Yang Sai Zhang
3
Brad McDanel Marcus Comiter Youngjune Gwon Miriam Cha Philippe Tillet Surat Teerapittayanon Sai Zhang Xin Dong James Yang
Two new PhD graduate students: Vikas Natesh and Andrew Sabot Red color: students who have contributed to work reported in this presentation
4
CNN with 4 Layers
Fully Connected Convolution Convolution Convolution
= = = =
Filter Matrix rose
Matrix Multiplication View
Data Matrix Result Matrix prediction
6
Result (N output feature maps)
Convolution
Data (M input channels)
N
N Filters
M
…
k k f1 fN d1 k k M dJ
d1 d2 dJ Filter matrix Data matrix
…
r1 rN
…
r2 Result matrix f1 f2 fN
7
d1 d2 dJ Filter matrix Data matrix
…
Result matrix f1 f2 fN r1 rN
…
r2
fn f1 rn r1 r2 d1
…
… f2 d2 dj
Data Result
Data skew
Filter matrix
[Kung and Leiserson 1979] VLSI Processor Arrays [Kung 1982] Why Systolic Architectures?
8
9
10
11
12
1.
2.
For high packing density, in combining columns we allow
We retrain the remaining weights to bring up inference accuracy
Packed Filter Matrix Data
Sparse Filter Matrix Packed Filter Matrix Column Combining Combine multiple sparse columns, e.g., 8 columns into a dense one
Mapped to systolic array
2-2
d4 d3 z Z + 2-2 x d3
14
Column combining (5x reduction in tiles)
29 columns 150 columns
Packed filter matrix Original sparse filter matrix
15
Consecutive columns combined
16
17
18
Filter Matrix
19
Combining Reduced memory access
20
On-switch combining Many systolic arrays
21
22
Baseline Maestro
23
24
25
(1) Co-design to allow high-utilization systolic arrays for sparse CNN (2) Use of many small systolic arrays wins