Moving CNN Accelerator Computations Closer to Data
Sumanth Gudaparthi Surya Narayanan Rajeev Balasubramonian
1
Moving CNN Accelerator Computations Closer to Data Sumanth - - PowerPoint PPT Presentation
1 Moving CNN Accelerator Computations Closer to Data Sumanth Gudaparthi Surya Narayanan Rajeev Balasubramonian Evolution of CNN Accelerators 2 DRISA DianNao, DaDianNao, etc. Higher-cost DRAM, Limited by memory Cant be used as a
Sumanth Gudaparthi Surya Narayanan Rajeev Balasubramonian
1
Moore’s Law
Transistor scaling is coming to an end
Digital Accelerators
Limited by memory bandwidth
Analog in-situ Accelerators
Complex Analog circuits, Lack of Flexibility
Digital in-situ Accelerators
Higher-cost DRAM, Can’t be used as a host memory
Evolution of CNN Accelerators
DianNao, DaDianNao, etc. ISAAC, PRIME etc DRISA
2
SRAM based In-Situ Computation Accelerator
AIA vs SISCA
Use SRAM cells to perform In- Situ Computations
DIA vs SISCA
Modify the LLC. Trivial overhead
DA vs SISCA
Perform Computations In- Situ DA: Digital Accelerators AIA: Analog In-situ Accelerators DIA: Digital In-situ Accelerators SISCA: Proposed Accelerator
3
Logic-In-Memory
BL BLB WL WLB 1 WL WLB BL BLB Jeloka et al., 2016
4
Logic-In-Memory
1 WL1 WLB1 1 WL2 WLB2 BL BLB 1 1 1 1 1 1 1 -> 0 1 1
Pre-charge the bit-lines Activate the word-lines Discharge of bit-line voltage through Cell1 Discharge of bit-line voltage through both Cells Bit-line stays Pre-charged
Jeloka et al., 2016
Cell1 Cell2
5
Enabling In-Situ Multiplication in Caches
W0-0 W0-1 W0-2 I0-0 I0-1 I0-2 * W0-2I0-2 I0-2 I0-2 W0-1 W0-0 W0-2 W0-2 W0-1 W0-1 W0-0 W0-0I0-1 I0-1 I0-1 I0-0 I0-0 I0-0 W0-0 W0-0 W0-0 W0-1 W0-1 W0-1 W0-2 W0-2 W0-2 I0-0 I0-1 I0-2
Ca-b: bit number-b in ath variable of C
6
SISCA Organization
LC:1 Unused Sub-array Entries Kernel Entries Feature Map Entries Ca-b: bit number-b in ath variable of C SA1 SA4 SA2 SA3
H-Tree
RAT4 RAT3 RAT2 RAT1 Shifter Banks
7
SISCA Dataflow
Sub Array 1 Sub Array 2 Sub Array 3 Input Feature Map (6x6) Kernel Maps 2x(3x3) Output Feature Maps 2x(4x4)
8
Energy Improvements
Energy Efficient
9
Performance Improvements
2.7x
Higher Throughput
10
Conclusions and Future Work
11
and CNN data across the Cache.
12