moving cnn accelerator computations closer to data
play

Moving CNN Accelerator Computations Closer to Data Sumanth - PowerPoint PPT Presentation

1 Moving CNN Accelerator Computations Closer to Data Sumanth Gudaparthi Surya Narayanan Rajeev Balasubramonian Evolution of CNN Accelerators 2 DRISA DianNao, DaDianNao, etc. Higher-cost DRAM, Limited by memory Cant be used as a


  1. 1 Moving CNN Accelerator Computations Closer to Data Sumanth Gudaparthi Surya Narayanan Rajeev Balasubramonian

  2. Evolution of CNN Accelerators 2 DRISA DianNao, DaDianNao, etc. Higher-cost DRAM, Limited by memory Can’t be used as a bandwidth host memory Moore’s Analog in-situ Law Accelerators Digital in-situ Digital Accelerators Accelerators Complex Analog Transistor scaling is circuits, Lack of coming to an end Flexibility ISAAC, PRIME etc

  3. SRAM based In-Situ Computation Accelerator 3 DA vs SISCA AIA vs SISCA DIA vs SISCA Perform Use SRAM cells Modify the LLC. Computations In- to perform In- Trivial overhead Situ Situ on baseline Cache Computations operations SISCA: Proposed Accelerator DA: Digital Accelerators AIA: Analog In-situ Accelerators DIA: Digital In-situ Accelerators

  4. Logic-In-Memory 4 WL WL 1 0 WLB BLB BL BLB BL WLB Jeloka et al., 2016

  5. Logic-In-Memory 5 WL1 1 0 1 0 0 0 0 Pre-charge the bit-lines 1 1 WLB1 Activate the word-lines Cell1 Discharge of bit-line voltage WL2 through Cell1 Discharge of bit-line voltage 1 0 1 1 1 0 0 through both Cells WLB2 Cell2 Bit-line stays Pre-charged 1 -> 0 1 1 BL BLB Jeloka et al., 2016

  6. Enabling In-Situ Multiplication in Caches 6 W 0-0 W 0-1 W 0-2 W 0-0 W 0-1 W 0-2 I 0-0 I 0-1 I 0-2 * W 0-1 W 0-2 W 0-0 W 0-2 I 0-2 I 0-2 W 0-2 I 0-2 W 0-0 W 0-1 W 0-1 W 0-0 I 0-0 I 0-1 I 0-2 W 0-2 I 0-1 W 0-0 I 0-1 W 0-1 I 0-1 W 0-1 W 0-2 I 0-0 W 0-0 I 0-0 I 0-0 C a-b : bit number-b in a th variable of C

  7. SISCA Organization 7 Banks LC:1 SA1 SA2 RAT1 RAT2 H-Tree RAT3 RAT4 SA3 SA4 Shifter Feature Map Unused Sub-array C a-b : bit number-b in a th variable of C Kernel Entries Entries Entries

  8. SISCA Dataflow 8 Sub Array 1 Input Feature Map (6x6) Sub Array 2 Kernel Maps 2x(3x3) Sub Array 3 Output Feature Maps 2x(4x4)

  9. Energy Improvements 9 6.3x Energy Efficient

  10. Performance Improvements 10 2.7x Higher Throughput

  11. Conclusions and Future Work 11 • SISCA is an SRAM in-situ computation Engine for Convolution Neural Networks • Uses on-chip Last Level Cache (LLC) to perform computations • SISCA is 6.3x Energy efficient, and has 2.7x higher throughput than DaDianNao • Better dataflow and mapping mechanisms can further improve the Energy and Throughput. • Need to work on better scheduling mechanisms to distribute the general purpose workload, and CNN data across the Cache.

  12. 12 Questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend