Moving CNN Accelerator Computations Closer to Data Sumanth - - PowerPoint PPT Presentation

moving cnn accelerator computations closer to data
SMART_READER_LITE
LIVE PREVIEW

Moving CNN Accelerator Computations Closer to Data Sumanth - - PowerPoint PPT Presentation

1 Moving CNN Accelerator Computations Closer to Data Sumanth Gudaparthi Surya Narayanan Rajeev Balasubramonian Evolution of CNN Accelerators 2 DRISA DianNao, DaDianNao, etc. Higher-cost DRAM, Limited by memory Cant be used as a


slide-1
SLIDE 1

Moving CNN Accelerator Computations Closer to Data

Sumanth Gudaparthi Surya Narayanan Rajeev Balasubramonian

1

slide-2
SLIDE 2

Moore’s Law

Transistor scaling is coming to an end

Digital Accelerators

Limited by memory bandwidth

Analog in-situ Accelerators

Complex Analog circuits, Lack of Flexibility

Digital in-situ Accelerators

Higher-cost DRAM, Can’t be used as a host memory

Evolution of CNN Accelerators

DianNao, DaDianNao, etc. ISAAC, PRIME etc DRISA

2

slide-3
SLIDE 3

SRAM based In-Situ Computation Accelerator

AIA vs SISCA

Use SRAM cells to perform In- Situ Computations

DIA vs SISCA

Modify the LLC. Trivial overhead

  • n baseline Cache
  • perations

DA vs SISCA

Perform Computations In- Situ DA: Digital Accelerators AIA: Analog In-situ Accelerators DIA: Digital In-situ Accelerators SISCA: Proposed Accelerator

3

slide-4
SLIDE 4

Logic-In-Memory

BL BLB WL WLB 1 WL WLB BL BLB Jeloka et al., 2016

4

slide-5
SLIDE 5

Logic-In-Memory

1 WL1 WLB1 1 WL2 WLB2 BL BLB 1 1 1 1 1 1 1 -> 0 1 1

Pre-charge the bit-lines Activate the word-lines Discharge of bit-line voltage through Cell1 Discharge of bit-line voltage through both Cells Bit-line stays Pre-charged

Jeloka et al., 2016

Cell1 Cell2

5

slide-6
SLIDE 6

Enabling In-Situ Multiplication in Caches

W0-0 W0-1 W0-2 I0-0 I0-1 I0-2 * W0-2I0-2 I0-2 I0-2 W0-1 W0-0 W0-2 W0-2 W0-1 W0-1 W0-0 W0-0I0-1 I0-1 I0-1 I0-0 I0-0 I0-0 W0-0 W0-0 W0-0 W0-1 W0-1 W0-1 W0-2 W0-2 W0-2 I0-0 I0-1 I0-2

Ca-b: bit number-b in ath variable of C

6

slide-7
SLIDE 7

SISCA Organization

LC:1 Unused Sub-array Entries Kernel Entries Feature Map Entries Ca-b: bit number-b in ath variable of C SA1 SA4 SA2 SA3

H-Tree

RAT4 RAT3 RAT2 RAT1 Shifter Banks

7

slide-8
SLIDE 8

SISCA Dataflow

Sub Array 1 Sub Array 2 Sub Array 3 Input Feature Map (6x6) Kernel Maps 2x(3x3) Output Feature Maps 2x(4x4)

8

slide-9
SLIDE 9

Energy Improvements

6.3x

Energy Efficient

9

slide-10
SLIDE 10

Performance Improvements

2.7x

Higher Throughput

10

slide-11
SLIDE 11

Conclusions and Future Work

11

  • SISCA is an SRAM in-situ computation Engine for Convolution Neural Networks
  • Uses on-chip Last Level Cache (LLC) to perform computations
  • SISCA is 6.3x Energy efficient, and has 2.7x higher throughput than DaDianNao
  • Better dataflow and mapping mechanisms can further improve the Energy and Throughput.
  • Need to work on better scheduling mechanisms to distribute the general purpose workload,

and CNN data across the Cache.

slide-12
SLIDE 12

Questions?

12