A Systems Approach to Computing in Beyond CMOS Fabrics A. D. - PowerPoint PPT Presentation

A Systems Approach to Computing in Beyond CMOS Fabrics A. D. Patil, N. R. Shanbhag, L. R. Varshney, E. Pop, H.-S. P. Wong, S. Mitra, J. Rabaey, J. Weldon, L. Pileggi, S. Manipatruni, D. Nikonov, and I. A. Young A Center www.sonic-center.org

Machines are Beating Humans at Complex Inference Tasks … [The Guardian, May 2017] Extracting information from data recognition, prediction, classification decision-making …… DATA BMW Ian Sasi INFORMATION … the energy cost is HUGE (e.g., 𝟐𝟏𝟏𝟏𝟏𝒀 ) 2

Existing Solutions are Reaching their Limits architecture device Memory d 0 d 1 d 2 d 3 … … Row decoder Row decoder memory WL ∆𝑾 𝑪𝑴 driver Precharge 𝑢ℎ L :1 L :1 L :1 col. mux col. mux col. mux SA SA SA interface Mux & buffer K -b bus Decision ( 𝒛 ) Digital processor processor [Pop-NanoResearch-2010] Input buffer ( 𝑸 ) • diminishing energy-delay benefits • von Neumann architecture mismatched to from CMOS scaling inference workloads requirements • variations dominate • data movement problem (memory wall) 3

fundamental question how do we design intelligent machines that operate at the limits of energy ? 4

Research Inspiration Shannon-inspired Brain-inspired reliable systems operating at the limits of energy-efficiency employ stochastic components & statistical models of computation stochastic channel stochastic neural fabric Key: embrace stochasticity of nanoscale fabrics via Shannon/brain- inspired models of computation 5

Statistical Information Processing A principled approach to non von Neumann computing A Journey from Systems-to-Devices Information-based metrics, fundamental limits, design principles Principles of Prototypes Shannon/brain-inspired Statistical Information to In CMOS & to architectures Processing Beyond CMOS deep in-memory arch. fundamental limits on energy efficiency Input buffer ( ) Decision ( ) Cross BL processor (CBLP) ADC e h e h P , ( , ) observations corrected & RDL BLP BLP BLP BLP BLP BLP output y 1 d 0 y FR row decoder FR row decoder Estimator/ d 1 x y ˆ 2 C y Detector d 2 N d 3 application-derived WL driver Precharge metric L :1 L :1 L :1 col. mux col. mux col. mux = max 𝑄(𝑧|𝑧 * , . . , 𝑧 - ) SA SA SA Mux & buffer K -b bus Stochastic nanofunctions, models, prototypes 6

Statistical Information Processing algorithms prototypes in & beyond CMOS architectures Application requirements Shannon & Brain-inspired Models of Computing Nanofunctions device & Beyond-CMOS devices circuit 7

Systems in Beyond CMOS using Shannon & Brain-inspired Models of Computing • hyperdimensional (HD) computing via 3D VRRAM– brain-inspired • Shannon-inspired spintronics 8

Hyperdimensional (HD) Computing [Rabaey, Olshausen, Mitra, Wong] hypervectors HD HD input output mapper Algebra HD ≈ 10000 -bit ⃗ = 𝐵 ⃗ + 𝐶 + 𝐷 ⃗ 4-bit 𝑇 ⃗ is similar to 𝐵 ⃗, 𝐶 and 𝐷 ⃗ ) ( 𝑇 ⃗ = 𝐵 ⃗ ∗ 𝐶 ∗ 𝐷 ⃗ 𝑇 ⃗ is dissimilar to all 𝐵 ⃗, 𝐶 and 𝐷 ⃗ ) ( 𝑇 MAP • almost all vectors are nearly-orthogonal • orthogonalization via local M ultiply (XOR) • requires random mapping • summarization via local A dd (OR) • related to Shannon’s random codes • ordered summarization via P ermutation 9

RRAM Stochasticity for Random Mapping Binary bits statistically produced/stored in RRAMs Experimental data High resistance state: 0 Low resistance state: 1 4 10 100% 1. Top electrode Top electrode V TE < V SET Pulse Width (ns) 75% 0. 3 10 50% 50% 0. P SET < 1 Bottom electrode Bottom electrode 25% 0. Oxygen vacancy (V O ) Stochastic V O 2 10 0. 0% 1001100111……0100111101 0.7 0.8 0.9 1.0 1.1 (hyper-vector) Pulse Amplitude (V) P SET : SET probability (switching from ‘0’ to ‘1’) [H. Li,…, H.-S. P. Wong, IEDM , 2016] 10

3D VRRAM as In-Memory Computing Kernels VRRAM: Vertical Resistive Random Access Memory Plane electrode Word line (WL) Pillar electrode RRAM cell z x y Select line (SL) 1T-4R Bit line (BL) TiN/Ti TiN/Ti (50 nm) Layer 4 (L4) TiN (TE) TiN Layer 3 (L3) TiN Layer 2 (L2) TiN (BE) Layer 1 (L1) TiN (20 nm) FinFET 50 nm Fab by NDL, Taiwan 11

In-Memory MAP Kernels using 3D RRAM 0 0 1 0 1 0 0 1 1 1 1 1 0 1 0 1 0 1 0 1 1 0 0 0 0 1 0 1 1 0 0 1 0 0 1 1 1 0 Multiplication Addition Permutation Resistance 1M Measured LRS Measured HRS 1 1 0 (~10kΩ) (400kΩ-1MΩ) 10k 1111 4 5 6 7 1 D = 1 28 L4 0 0 0 1 4 gnd Resistance ( W ) 0111 Bit 1 up L3 V DD 0 0 1 0 3 gnd D 1 21 Current ( µ A) 0 C L2 0 1 V DD 0 0 2 gnd 100k 0011 0 B 14 200 ns 1 L1 1 V DD 0 0 0 A 1 0001 Input AB = pillar addr. = 10 7 0000 L4 0 V DD 1 1 1 1M C = 0 Bit 0 down 200 ns 0 L3 1 0 V DD 1 1 1k 1M 1G 1T gnd Logic Evaluation 1 3 5 7 9 11 10 10 10 10 10 10 Logic Evaluation Cycle (#) 1 0 L2 1 V DD 1 gnd Addition Cycle (#) L1 1 1 1 0 gnd [H. Li,…, H.-S. P. Wong, IEDM , 2016] 12

3D VRRAM In-Memory HD Architecture • Device-architecture co-design for 3D VRRAM-centric HD computing • MAP operations of HD computing mapped onto 3D VRRAM arrays • > 400 × area reduction (28-nm node) compared with digital design [1] Algorithm Architecture Device (a) (b) Input texts 1.78E6 Decoder 1600 P SET » 50% Routing: + 2158.1 Projection 9.16E5 6 MUX 10 PERM 2 ) Letter Random HD vectors SA Component Area ( µ m (3 layers) Sampling Cell array 660 ´ 1200 ADD 2 ) 3-letter sequences 5 10 XOR Total Area ( µ m 412 ´ Trigram MAP XOR Routing: (5 layers) + 2081.5 Routing: Compute trigrams + 1699.7 800 XOR 4 ADD 10 (1 layer) 3394 Binding (addition) 2691 2223 ADD Generate (learn) LangMap language/text maps 400 Store (6 layers) 3 10 (one for each text) MAP (HamD) XOR 28nm LP HamD 3D VRRAM Training : finish XOR 0 2 10 Measure Inference : measure HamD 1 kb 2 kb 10 kb 1 kb 2 kb 10 kb (21 layers) & identify the ‘nearest’ Store ADD HD Vector Size HD Vector Size 4 kb ´ 36 layers MAP One-shot learning [[1] A. Rahimi et al. , ISLPED, p.64, 2016] 38 13

Shannon-inspired Spintronics [Patil, Shanbhag, Manipatruni, Nikonov, Young, MMM-Intermag’16, arXiv’17] • Can we make all-spin logic competitive with CMOS? SPIN CMOS • deterministic nanomagnet switching deterministic costs much energy & delay regime stochastic • need to design spintronic systems in regime the stochastic regime [Nikonov-JXCDC-2015] • But first…......need system-friendly abstract models of spin devices 14

� Systems-friendly Nanomagnetic Model 40× reduction in switching energy as 𝜗 scales from 10 @*P to 10 @* 𝜗 > ≈ 𝑓 @ AB 𝜗 = 0.5 𝜗 𝜗 𝐹, 𝑈 C 𝜗 𝜗 [Butler-TMAG-2012] 𝟗. 𝟒× 𝜗 L 𝜗 > = 𝑗 L 𝐽 HIJK 𝐹 𝑗, 𝑈 𝑆𝑈 > 𝟕× 𝜗 𝟗× 𝟔× 𝜗 𝜗 𝜗 : Switching error rate 𝐹 : Switching energy 𝑗 : Current overdrive factor energy numbers from [Manipatruni, et al., Physical Review Applied’16] 𝐽 HIJK : Critical supply current 15

The 𝝑 -Noisy Spin Gate Model ! -noisy non-volatile AND gate 𝜄 ∈ {0,1} : Bernoulli random variable ' ( = * + , + Pr 𝜄 = 1 = 𝜗 A ' - t B 0 q ! t C + " t T Error-free g 1 gate Gate is OFF Gate is OFF " # $ %# & 1 = { C A B } # # + ! t t t " Virtual gate operation Gate is ON emulating ! -noise Switching occurs • includes input dependence • enables evaluation of complex networks 16

Shannon-inspired Statistical Error Compensation (SEC) Fusion block 𝑧 _ = MAP estimate of 𝑧 ` Desired property : disparity between error probability mass functions ! " ($) ! & (') sparse dense error compensator is robust: 𝜗 < 10 @^ error compensator is efficient: 𝑚 > 𝑟 & 𝑙 > 𝑛 0 [Zhang, Shanbhag, IEEE Trans. Signal Processing, 2016] [Gonugondla, Shim, Shanbhag, ICASSP, 2016] [Abdallah, Shanbhag, IEEE J. solid-state circuits, 2013] 17

Enhancing Disparity in Error Error distribution at the output of 15bit RCA Probablity Mass Functions all delays equal error probability Path Delay Balancing (PDB) “maximally” slow network → “minimally” error-prone network, without energy increase error magnitude error probability after PDB Path Delay Redistribution (PDR) generates a sparse error distribution error magnitude error probability after PDB & PDR error magnitude 18

Seizure Detection using a Support Vector Machine (SVM) 𝒙 𝑐 CHB-MIT EEG dataset SVM 𝑨 𝒚 Classifier 𝑨 = 0 𝒙 B 𝒚 + 𝑐 ≶ 0 [Verma-JSSC-2010] 𝑨 = 1 𝒚 : feature vector extracted from EEG signals 𝒙 : trained weight vector 𝑐 : trained scalar bias 𝑨 : decision; 𝑨 = 1 ⇒ seizure 19

Simulation Results Main Block Error Compensator Gate count 52.8k 5.608k RPE-EST : Reduced precision embedded estimator 20

Fundamental Limits on Energy, Delay and Accuracy (what’s possible , what’s impossible , and the grey area in between) [Shannon, 1948] § use abstract models of nanoscale fabrics to § obtain fundamental limits on information processing capacity and § extract design principles to approach those limits 21

A Systems Approach to Computing in Beyond CMOS Fabrics A. D. - PowerPoint PPT Presentation

A Systems Approach to Computing in Beyond CMOS Fabrics A. D. Patil, N. R. Shanbhag, L. R. Varshney, E. Pop, H.-S. P. Wong, S. Mitra, J. Rabaey, J. Weldon, L. Pileggi, S. Manipatruni, D. Nikonov, and I. A. Young A Center

On On On On On On On On CMOS Circuit CMOS Circuit CMOS Circuit CMOS Circuit CMOS

FINISHES OVERVIEW 2019 FABRIC UPDATES OPS is proud to now offer 360+ fabrics across Grade 1 and

Working Around the Limits of Working Around the Limits of CMOS CMOS CMOS CMOS Mary Jane

CMOS Image Sensor Simulation 2D and 3D Simulation Basic Structure and operation of CMOS Image

Tissue Properties and Manufacturing Forming and TAD Fabrics Peter McCabe Tissue Business Leader

Exposition of Fabrics and Accessories for Garment Production September 5 6, 2018 Moscow City

NEW TEXTILES Romo fabrics & Lola velour LINARA Romo fabrics / 8 colours selected by HAY

Regular Fabrics for Retiming & Regular Fabrics for Retiming & Pipelining over Global

Outline A Brief History CMOS Gate Design Pass Transistors CMOS Latches &

CMOS Inverter: Power Dissipation and Sizing CMOS Inverter Power Dissipation Professor Chris H.

ASIC Physical Design CMOS Processes Smith Text: Chapters 2 & 3 Weste CMOS VLSI

Decorative synthetic upholstery fabrics for over 50 years. Morbern... A long past, a bright

INTERFABRIC-2018. SPRING IV International Exhibition of Fabrics and Textile Materials In the

Integration, Architecture, and Applications of 3D CMOS Memristor Circuits K. T. Tim Cheng

A CMOS Label- -free DNA free DNA A CMOS Label Microarray Microarray Erik Anderson Stanford

A Novel DPS Integrator for Fast CMOS Imagers J. M. Margarit, J. Sabadell, L. Ters and F.

Offshoring and Price Measurement in the Semiconductor Industry David Byrne Federal Reserve Board

Computer For The Next Generation Worlds Smallest Computer Taiwan based OEM joint

Were in the most rapid period of change in history I chose to believe that I was a person ,

Van Lanschot NV MARCH / APRIL 2011 Executive summary Van Lanschot aim s to be the best Private

Data Corruption in even/odd bit-line Nand architecture Cell to Cell Coupling [source]

Atomic Switch for making new type of electronic devices and systems Tsuyoshi Hasegawa WPI Center

Optimisation of Home eNodeBs ~ Management of Relays and Repeaters Ove Linnell Ericsson

Automated Robotic Cell Team 07 The Team Alex Keizer Josh Tempelman Michael Bissetta Nathan

A Systems Approach to Computing in Beyond CMOS Fabrics A. D. - PowerPoint PPT Presentation

A Systems Approach to Computing in Beyond CMOS Fabrics A. D. Patil, N. R. Shanbhag, L. R. Varshney, E. Pop, H.-S. P. Wong, S. Mitra, J. Rabaey, J. Weldon, L. Pileggi, S. Manipatruni, D. Nikonov, and I. A. Young A Center

On On On On On On On On CMOS Circuit CMOS Circuit CMOS Circuit CMOS Circuit CMOS

FINISHES OVERVIEW 2019 FABRIC UPDATES OPS is proud to now offer 360+ fabrics across Grade 1 and

Working Around the Limits of Working Around the Limits of CMOS CMOS CMOS CMOS Mary Jane

CMOS Image Sensor Simulation 2D and 3D Simulation Basic Structure and operation of CMOS Image

Tissue Properties and Manufacturing Forming and TAD Fabrics Peter McCabe Tissue Business Leader

Exposition of Fabrics and Accessories for Garment Production September 5 6, 2018 Moscow City

NEW TEXTILES Romo fabrics &amp; Lola velour LINARA Romo fabrics / 8 colours selected by HAY

Regular Fabrics for Retiming &amp; Regular Fabrics for Retiming &amp; Pipelining over Global

Outline A Brief History CMOS Gate Design Pass Transistors CMOS Latches &amp;

CMOS Inverter: Power Dissipation and Sizing CMOS Inverter Power Dissipation Professor Chris H.

ASIC Physical Design CMOS Processes Smith Text: Chapters 2 &amp; 3 Weste CMOS VLSI

Decorative synthetic upholstery fabrics for over 50 years. Morbern... A long past, a bright

INTERFABRIC-2018. SPRING IV International Exhibition of Fabrics and Textile Materials In the

Integration, Architecture, and Applications of 3D CMOS Memristor Circuits K. T. Tim Cheng

A CMOS Label- -free DNA free DNA A CMOS Label Microarray Microarray Erik Anderson Stanford

A Novel DPS Integrator for Fast CMOS Imagers J. M. Margarit, J. Sabadell, L. Ters and F.

Offshoring and Price Measurement in the Semiconductor Industry David Byrne Federal Reserve Board

Computer For The Next Generation Worlds Smallest Computer Taiwan based OEM joint

Were in the most rapid period of change in history I chose to believe that I was a person ,

Van Lanschot NV MARCH / APRIL 2011 Executive summary Van Lanschot aim s to be the best Private

Data Corruption in even/odd bit-line Nand architecture Cell to Cell Coupling [source]

Atomic Switch for making new type of electronic devices and systems Tsuyoshi Hasegawa WPI Center

Optimisation of Home eNodeBs ~ Management of Relays and Repeaters Ove Linnell Ericsson

Automated Robotic Cell Team 07 The Team Alex Keizer Josh Tempelman Michael Bissetta Nathan

NEW TEXTILES Romo fabrics & Lola velour LINARA Romo fabrics / 8 colours selected by HAY

Regular Fabrics for Retiming & Regular Fabrics for Retiming & Pipelining over Global

Outline A Brief History CMOS Gate Design Pass Transistors CMOS Latches &

ASIC Physical Design CMOS Processes Smith Text: Chapters 2 & 3 Weste CMOS VLSI