promise
play

PROMISE An End-To-End Design of a PROgrammable MIxed-Signal - PowerPoint PPT Presentation

PROMISE An End-To-End Design of a PROgrammable MIxed-Signal AccElerator for Machine Learning Algorithms Prakalp Srivastava *, Mingu Kang *, Sujan K. Gonugondla, Sungmin Lim, Jungwook Choi, Vikram Adve, Nam S. Kim, Naresh Shanbhag


  1. PROMISE An End-To-End Design of a PROgrammable MIxed-Signal AccElerator for Machine Learning Algorithms Prakalp Srivastava *, Mingu Kang *, Sujan K. Gonugondla, Sungmin Lim, Jungwook Choi, Vikram Adve, Nam S. Kim, Naresh Shanbhag (psrivas2@illinois.edu, mingu.kang@ibm.com) * Equal Contribution Supported by NSF, C-FAR, and SONIC PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign

  2. Machine Learning under Resource Constraints • Embedded statistical inference: IoT, sensor-rich platforms • Decision making under resource constraints PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 1 / 23

  3. Energy Trend of Memory vs. Processing Inception-v4 Integer ADD Mult 80 8 bits 0.03 pJ 0.2 pJ ResNet-152 Inception-v3 32 bits 0.1 pJ 3 pJ Top-1 accuracy [%] Computation energy (45nm) ENet Memory 64 bits VGG-16 VGG-19 BN-NIN Cache 8 KB 10 pJ 35M 155M BN-AlexNet Cache 32 KB 20 pJ Cache 1 MB 100 pJ AlexNet # of parameters DRAM 1.2 – 2.6 nJ 50 0 40 Memory access energy (45nm) Operations [G-Ops] Component-level energy trend Accuracy vs. amount of operations, in modern processor and number of parameters [ Horowitz, ISSCC14’ ] [ Canziani, Arxiv16’ ] PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 2 / 23

  4. Deep In-memory Architecture (DIMA) Precharge/Y-decoder X-decoder X-decoder BLP BLP BLP BLP BLP BLP Cross Bitline Processor RDL Decision Deeply embeds analog computing at the periphery of bitcell array • Low-swing / Low-SNR operations for aggressive energy efficiency • [M. Kang, JSSC18, J. Zhang, VLSI16, S. Gonugondla, ISSCC18, A Biswas, ISSCC18] PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 3 / 23

  5. DIMA Prototypes f o o r Multi-functional Random forest On-chip training P inference processor processor processor (65nm CMOS) (65nm CMOS) (65nm CMOS) E [ Sujan Gonugondla, [ Mingu Kang, JSSC18 [ Mingu Kang, JSSC18 ] E ISSCC18 ] Mingu Kang, ESSCIRC17 ] E 53 × EDP ↓ 7 × EDP ↓ 100 × EDP ↓ I PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 4 / 23

  6. DIMA Prototypes f o o r Multi-functional Random forest On-chip training P inference processor processor processor (65nm CMOS) (65nm CMOS) (65nm CMOS) E [ Sujan Gonugondla, [ Mingu Kang, JSSC18 [ Mingu Kang, JSSC18 ] E ISSCC18 ] Mingu Kang, ESSCIRC17 ] E Lack of Programmability 53 × EDP ↓ 7 × EDP ↓ 100 × EDP ↓ I PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 4 / 23

  7. Goals & Challenges of PROMISE 1. Analog programmable hardware and ISA design 2. End-to-End application mapping to PROMISE 3. Optimal energy with accuracy guarantee PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 5 / 23

  8. Goals & Challenges of PROMISE 1. Analog programmable hardware and ISA design − Analog noise management − Intrinsic sequentiality of operations − High variations in delay across different analog operations 2. End-to-End application mapping to PROMISE 3. Optimal energy with accuracy guarantee PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 5 / 23

  9. Goals & Challenges of PROMISE 1. Analog programmable hardware and ISA design 2. End-to-End application mapping to PROMISE Analog High-level circuit language (DIMA) - Voltage swing e.g. Fully- - ADC precision connect. layer - Analog noise 𝒁 = 𝑿 % 𝒀 - Leakage 3. Optimal energy with accuracy guarantee PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 5 / 23

  10. Goals & Challenges of PROMISE 1. Analog programmable hardware and ISA design 2. End-to-End application mapping to PROMISE 3. Optimal energy with accuracy guarantee − Energy vs. accuracy trade-off in analog circuit − Maximize energy savings − Accuracy guarantees across long chain of analog processing PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 5 / 23

  11. Our Contributions Programmability Challenge – 1 Precharge/Y-decoder High-level X-decoder X-decoder Program DNN DNN Ma Matched Filter BLP BLP BLP BLP BLP BLP SVM SV Cross Bit-line Processor RDL PCA PC PROM PR OMIS ISE Ha Hardware … PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 6 / 23

  12. Our Contributions Programmability Challenge – 1 Programmability Challenge – 2 Precharge/Y-decoder PROMISE High-level PROMISE X-decoder X-decoder ISA Program Compiler DNN DNN Ma Matched Filter BLP BLP BLP BLP BLP BLP SVM SV Cross Bit-line Processor RDL PCA PC PROM PR OMIS ISE Ha Hardware … PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 6 / 23

  13. Our Contributions Programmability Challenge – 1 Programmability Challenge – 2 Precharge/Y-decoder PROMISE Optimized PROMISE High-level PROMISE X-decoder X-decoder ISA ISA Program Compiler DNN DNN Ma Matched Filter BLP BLP BLP BLP BLP BLP SV SVM Cross Bit-line Processor RDL PCA PC Energy PROMISE PR PROM OMIS ISE Ha Hardware … Optimization ISA Programmability Challenge – 3 PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 6 / 23

  14. Prior Art PRIME RedEye PuDianNao [ P. Chi, ISCA16 ] [ R.L. Wa, ISCA15 ] [ D. Liu, ASPLOS15 ] • Instruction set architecture • ReRAM in-memory • Processor in image processor sensor • Various ML algorithms • Limited programmability • Digital implementation • Limited error management PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 7 / 23

  15. Processing Stages in DIMA Precharge/Y-decoder X-decoder X-decoder BLP BLP BLP BLP BLP BLP Cross Bitline Processor 1. Analog READ (aRead) 2. Bitline processing (BLP) 3. Cross BLP (CBLP) 4. ADC & Residual digital logic (RDL) PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 8 / 23

  16. Energy vs. Accuracy Trade-off Energy efficiency↑ 120 Probability of detection* [%] 100 80 Accuracy↑ 60 40 20 0 0 10 20 30 Bitline voltage swing [mV] * Silicon measured results of template matching from [ Kang JSSC18 ] PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 9 / 23

  17. Energy vs. Accuracy Trade-off Energy efficiency↑ 120 Probability of detection* [%] 100 80 Accuracy↑ 60 40 20 0 0 10 20 30 Bitline voltage swing [mV] PROMISE SWING = 000 (min) SWING = 111 (max) Instruction * Silicon measured results of template matching from [ Kang JSSC18 ] PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 10 / 23

  18. Energy vs. Accuracy Trade-off … Accuracy × × ( SWING = ??? ) ( SWING = ??? ) ( SWING = ??? ) goal > 4096 possible combinations for 4 layers PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 11 / 23

  19. End-to-End Application to Architecture Mapping Programmability Challenge – 1 Programmability Challenge – 2 Precharge/Y-decoder Optimized PROMISE Julia PROMISE X-decoder X-decoder ISA Program Compiler DNN DNN Matched Filter Ma BLP BLP BLP BLP BLP BLP SVM SV Cross Bit-line Processor RDL PCA PC Energy PROMISE PR PROM OMIS ISE Ha Hardware … Optimization ISA Programmability Challenge – 3 PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 12 / 23

  20. Machine Learning Algorithms Distance Metric 𝒈(𝒆𝒋𝒕𝒖 𝑿, 𝒀 ) 𝒈( ) 6 SVM sign 0 𝑥 𝑗 𝑦[𝑗] 789 Vector distance (VD) 6 min Template Match 1 0 |𝑥 𝑗 − 𝑦 𝑗 | 789 6 min Template Match 2 < 0 𝑥 𝑗 − 𝑦 𝑗 789 6 DNN tanh, ReLU 0 𝑥 𝑗 𝑦[𝑗] 789 6 PCA - 0 𝑥 𝑗 𝑦[𝑗] 789 6 K -NN 1 majority vote 0 |𝑥 𝑗 − 𝑦 𝑗 | Scalar distance (SD) 789 6 K -NN 2 majority vote < 0 𝑥 𝑗 − 𝑦 𝑗 Threshold (TH) 789 6 Matched Filter min 0 𝑥 𝑗 𝑦[𝑗] 789 … … … Scalar distance (SD) à Aggregation: Vector distance (VD) à Threshold ( 𝒈( ) ) PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 13 / 23

  21. PROMISE ISA Precharge/Y-decoder Class 1 X-decoder X-decoder (aREAD) Analog Digital Class 2 BLP BLP BLP BLP BLP BLP Class 3 Class 4 (aSD, aVD) Cross Bitline Processor Decision RDL ADC (TH) Class 1 Class 2 Class 3 Class 4 signed multiply max/min aREAD unsigned multiply mean ADC aADD sum-abs sum Bit Precision aSUBT sum-abs 2 sigmoid/reLU/tanh compare threshold PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 14 / 23

  22. � � PROMISE ISA: Task Example: SAD-based template matching Rep Count Class 0 Class 1 Class 2 Class 3 Class 4 Set absolute # of ADC aSUBT Parameters candidates 6 bit min 0 |𝑒 7 | 𝑌 − 𝑍 SWING 𝑁 𝑌, 𝑋 address Task: PROMISE macro instruction (51 bits) Rep Count Class 0 Class 1 Class 2 Class 3 Class 4 Set Loop aREAD aSD, aVD ADC TH Parameters Iterations PROMISE | Srivastava, Kang et al. | University of Illinois at Urbana-Champaign 15 / 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend