a systems approach to computing in beyond cmos fabrics
play

A Systems Approach to Computing in Beyond CMOS Fabrics A. D. - PowerPoint PPT Presentation

A Systems Approach to Computing in Beyond CMOS Fabrics A. D. Patil, N. R. Shanbhag, L. R. Varshney, E. Pop, H.-S. P. Wong, S. Mitra, J. Rabaey, J. Weldon, L. Pileggi, S. Manipatruni, D. Nikonov, and I. A. Young A Center


  1. A Systems Approach to Computing in Beyond CMOS Fabrics A. D. Patil, N. R. Shanbhag, L. R. Varshney, E. Pop, H.-S. P. Wong, S. Mitra, J. Rabaey, J. Weldon, L. Pileggi, S. Manipatruni, D. Nikonov, and I. A. Young A Center www.sonic-center.org

  2. Machines are Beating Humans at Complex Inference Tasks … [The Guardian, May 2017] Extracting information from data recognition, prediction, classification decision-making …… DATA BMW Ian Sasi INFORMATION … the energy cost is HUGE (e.g., 𝟐𝟏𝟏𝟏𝟏𝒀 ) 2

  3. Existing Solutions are Reaching their Limits architecture device Memory d 0 d 1 d 2 d 3 … … Row decoder Row decoder memory WL ∆𝑾 𝑪𝑴 driver Precharge 𝑢ℎ L :1 L :1 L :1 col. mux col. mux col. mux SA SA SA interface Mux & buffer K -b bus Decision ( 𝒛 ) Digital processor processor [Pop-NanoResearch-2010] Input buffer ( 𝑸 ) • diminishing energy-delay benefits • von Neumann architecture mismatched to from CMOS scaling inference workloads requirements • variations dominate • data movement problem (memory wall) 3

  4. fundamental question how do we design intelligent machines that operate at the limits of energy ? 4

  5. Research Inspiration Shannon-inspired Brain-inspired reliable systems operating at the limits of energy-efficiency employ stochastic components & statistical models of computation stochastic channel stochastic neural fabric Key: embrace stochasticity of nanoscale fabrics via Shannon/brain- inspired models of computation 5

  6. Statistical Information Processing A principled approach to non von Neumann computing A Journey from Systems-to-Devices Information-based metrics, fundamental limits, design principles Principles of Prototypes Shannon/brain-inspired Statistical Information to In CMOS & to architectures Processing Beyond CMOS deep in-memory arch. fundamental limits on energy efficiency Input buffer ( ) Decision ( ) Cross BL processor (CBLP) ADC e h e h P , ( , ) observations corrected & RDL BLP BLP BLP BLP BLP BLP output y 1 d 0 y FR row decoder FR row decoder Estimator/ d 1 x y ˆ 2 C y Detector d 2 N d 3 application-derived WL driver Precharge metric L :1 L :1 L :1 col. mux col. mux col. mux = max 𝑄(𝑧|𝑧 * , . . , 𝑧 - ) SA SA SA Mux & buffer K -b bus Stochastic nanofunctions, models, prototypes 6

  7. Statistical Information Processing algorithms prototypes in & beyond CMOS architectures Application requirements Shannon & Brain-inspired Models of Computing Nanofunctions device & Beyond-CMOS devices circuit 7

  8. Systems in Beyond CMOS using Shannon & Brain-inspired Models of Computing • hyperdimensional (HD) computing via 3D VRRAM– brain-inspired • Shannon-inspired spintronics 8

  9. Hyperdimensional (HD) Computing [Rabaey, Olshausen, Mitra, Wong] hypervectors HD HD input output mapper Algebra HD ≈ 10000 -bit ⃗ = 𝐵 ⃗ + 𝐶 + 𝐷 ⃗ 4-bit 𝑇 ⃗ is similar to 𝐵 ⃗, 𝐶 and 𝐷 ⃗ ) ( 𝑇 ⃗ = 𝐵 ⃗ ∗ 𝐶 ∗ 𝐷 ⃗ 𝑇 ⃗ is dissimilar to all 𝐵 ⃗, 𝐶 and 𝐷 ⃗ ) ( 𝑇 MAP • almost all vectors are nearly-orthogonal • orthogonalization via local M ultiply (XOR) • requires random mapping • summarization via local A dd (OR) • related to Shannon’s random codes • ordered summarization via P ermutation 9

  10. RRAM Stochasticity for Random Mapping Binary bits statistically produced/stored in RRAMs Experimental data High resistance state: 0 Low resistance state: 1 4 10 100% 1. Top electrode Top electrode V TE < V SET Pulse Width (ns) 75% 0. 3 10 50% 50% 0. P SET < 1 Bottom electrode Bottom electrode 25% 0. Oxygen vacancy (V O ) Stochastic V O 2 10 0. 0% 1001100111……0100111101 0.7 0.8 0.9 1.0 1.1 (hyper-vector) Pulse Amplitude (V) P SET : SET probability (switching from ‘0’ to ‘1’) [H. Li,…, H.-S. P. Wong, IEDM , 2016] 10

  11. 3D VRRAM as In-Memory Computing Kernels VRRAM: Vertical Resistive Random Access Memory Plane electrode Word line (WL) Pillar electrode RRAM cell z x y Select line (SL) 1T-4R Bit line (BL) TiN/Ti TiN/Ti (50 nm) Layer 4 (L4) TiN (TE) TiN Layer 3 (L3) TiN Layer 2 (L2) TiN (BE) Layer 1 (L1) TiN (20 nm) FinFET 50 nm Fab by NDL, Taiwan 11

  12. In-Memory MAP Kernels using 3D RRAM 0 0 1 0 1 0 0 1 1 1 1 1 0 1 0 1 0 1 0 1 1 0 0 0 0 1 0 1 1 0 0 1 0 0 1 1 1 0 Multiplication Addition Permutation Resistance 1M Measured LRS Measured HRS 1 1 0 (~10kΩ) (400kΩ-1MΩ) 10k 1111 4 5 6 7 1 D = 1 28 L4 0 0 0 1 4 gnd Resistance ( W ) 0111 Bit 1 up L3 V DD 0 0 1 0 3 gnd D 1 21 Current ( µ A) 0 C L2 0 1 V DD 0 0 2 gnd 100k 0011 0 B 14 200 ns 1 L1 1 V DD 0 0 0 A 1 0001 Input AB = pillar addr. = 10 7 0000 L4 0 V DD 1 1 1 1M C = 0 Bit 0 down 200 ns 0 L3 1 0 V DD 1 1 1k 1M 1G 1T gnd Logic Evaluation 1 3 5 7 9 11 10 10 10 10 10 10 Logic Evaluation Cycle (#) 1 0 L2 1 V DD 1 gnd Addition Cycle (#) L1 1 1 1 0 gnd [H. Li,…, H.-S. P. Wong, IEDM , 2016] 12

  13. 3D VRRAM In-Memory HD Architecture • Device-architecture co-design for 3D VRRAM-centric HD computing • MAP operations of HD computing mapped onto 3D VRRAM arrays • > 400 × area reduction (28-nm node) compared with digital design [1] Algorithm Architecture Device (a) (b) Input texts 1.78E6 Decoder 1600 P SET » 50% Routing: + 2158.1 Projection 9.16E5 6 MUX 10 PERM 2 ) Letter Random HD vectors SA Component Area ( µ m (3 layers) Sampling Cell array 660 ´ 1200 ADD 2 ) 3-letter sequences 5 10 XOR Total Area ( µ m 412 ´ Trigram MAP XOR Routing: (5 layers) + 2081.5 Routing: Compute trigrams + 1699.7 800 XOR 4 ADD 10 (1 layer) 3394 Binding (addition) 2691 2223 ADD Generate (learn) LangMap language/text maps 400 Store (6 layers) 3 10 (one for each text) MAP (HamD) XOR 28nm LP HamD 3D VRRAM Training : finish XOR 0 2 10 Measure Inference : measure HamD 1 kb 2 kb 10 kb 1 kb 2 kb 10 kb (21 layers) & identify the ‘nearest’ Store ADD HD Vector Size HD Vector Size 4 kb ´ 36 layers MAP One-shot learning [[1] A. Rahimi et al. , ISLPED, p.64, 2016] 38 13

  14. Shannon-inspired Spintronics [Patil, Shanbhag, Manipatruni, Nikonov, Young, MMM-Intermag’16, arXiv’17] • Can we make all-spin logic competitive with CMOS? SPIN CMOS • deterministic nanomagnet switching deterministic costs much energy & delay regime stochastic • need to design spintronic systems in regime the stochastic regime [Nikonov-JXCDC-2015] • But first…......need system-friendly abstract models of spin devices 14

  15. � Systems-friendly Nanomagnetic Model 40× reduction in switching energy as 𝜗 scales from 10 @*P to 10 @* 𝜗 > ≈ 𝑓 @ AB 𝜗 = 0.5 𝜗 𝜗 𝐹, 𝑈 C 𝜗 𝜗 [Butler-TMAG-2012] 𝟗. 𝟒× 𝜗 L 𝜗 > = 𝑗 L 𝐽 HIJK 𝐹 𝑗, 𝑈 𝑆𝑈 > 𝟕× 𝜗 𝟗× 𝟔× 𝜗 𝜗 𝜗 : Switching error rate 𝐹 : Switching energy 𝑗 : Current overdrive factor energy numbers from [Manipatruni, et al., Physical Review Applied’16] 𝐽 HIJK : Critical supply current 15

  16. The 𝝑 -Noisy Spin Gate Model ! -noisy non-volatile AND gate 𝜄 ∈ {0,1} : Bernoulli random variable ' ( = * + , + Pr 𝜄 = 1 = 𝜗 A ' - t B 0 q ! t C + " t T Error-free g 1 gate Gate is OFF Gate is OFF " # $ %# & 1 = { C A B } # # + ! t t t " Virtual gate operation Gate is ON emulating ! -noise Switching occurs • includes input dependence • enables evaluation of complex networks 16

  17. Shannon-inspired Statistical Error Compensation (SEC) Fusion block 𝑧 _ = MAP estimate of 𝑧 ` Desired property : disparity between error probability mass functions ! " ($) ! & (') sparse dense error compensator is robust: 𝜗 < 10 @^ error compensator is efficient: 𝑚 > 𝑟 & 𝑙 > 𝑛 0 [Zhang, Shanbhag, IEEE Trans. Signal Processing, 2016] [Gonugondla, Shim, Shanbhag, ICASSP, 2016] [Abdallah, Shanbhag, IEEE J. solid-state circuits, 2013] 17

  18. Enhancing Disparity in Error Error distribution at the output of 15bit RCA Probablity Mass Functions all delays equal error probability Path Delay Balancing (PDB) “maximally” slow network → “minimally” error-prone network, without energy increase error magnitude error probability after PDB Path Delay Redistribution (PDR) generates a sparse error distribution error magnitude error probability after PDB & PDR error magnitude 18

  19. Seizure Detection using a Support Vector Machine (SVM) 𝒙 𝑐 CHB-MIT EEG dataset SVM 𝑨 𝒚 Classifier 𝑨 = 0 𝒙 B 𝒚 + 𝑐 ≶ 0 [Verma-JSSC-2010] 𝑨 = 1 𝒚 : feature vector extracted from EEG signals 𝒙 : trained weight vector 𝑐 : trained scalar bias 𝑨 : decision; 𝑨 = 1 ⇒ seizure 19

  20. Simulation Results Main Block Error Compensator Gate count 52.8k 5.608k RPE-EST : Reduced precision embedded estimator 20

  21. Fundamental Limits on Energy, Delay and Accuracy (what’s possible , what’s impossible , and the grey area in between) [Shannon, 1948] § use abstract models of nanoscale fabrics to § obtain fundamental limits on information processing capacity and § extract design principles to approach those limits 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend