Normally off computing for smart systems Cache and main memory - PowerPoint PPT Presentation

« Normally off » computing for smart systems Cache and main memory architecture based on MRAM: application to High Performance Computing and Embedded Systems Lionel Torres – Univ . Montpellier, France Merci à : G . Sassatelli, A . Gamatie, P . Benoit, P . Nouet, D . Novo, G . Dinatale, A . Todri, A . Virazel, L . Latorre, M . Robert, G . Patrigeon, P . Y . Peneau, J . Modad, F . Ouattara, J . Lopes, O . Coi, K . Sevin RTNS 2018

General context Current IC Integration Challenges ● Energy is critical ● We need more and more Performances for applications ● Actual technology limitations ( CMOS ) - Integration is more and more complex 10 9 transistors / cm2 ● Actual Reliability is a problem – X% of the systems encounter an uncorrectable error per year (X ranging from 1 to 5%) Source: Bernie Meyerson, IBM 140 Bi Bipolar CMOS CM IBM ES9000 120 100 aging Watts / cm 2 performance 80 Fujitsu VP2000 ? IBM GP IBM 3090S mature NTT IBM RY5 60 Pentium 4 Fujitsu M-780 IBM RY7 Techno 2 growing Pulsar 40 IBM 3090 IBM RY6 CDC Cyber 205 IBM RY4 embryonic IBM 4381 20 IBM 3081 Apache Fujitsu M380 Merced IBM 370 IBM 3033 Techno 1 IBM 360 Vacuum Pentium II(DSIP) 0 2 1950 1960 1970 1980 1990 2000 2010 time

General context Technology target : CMOS < 20 nm To Transport 1 bit à 1pJ / mm To transport 10 9 data – 1s ( 1Ghz ) à 1pJ / mm x 10 9 = 1mw / mm 64 Bits Bus à 64mw / mm On real IC à several W / cm2 Calcul, Bit transition à 1 aJ Calcul, 10 9 data transition - 1s à 1aJ x 10 9 = 1nw à It is better to ” calculate ” than to “ transport ”’ the information à In computing memory is certainly interesting à Reminder : minimal energy to change 1 bit d ’ information - K . T Ln2 à 2,85 zJ 3

One challenge : the memory ● Today, 50% of the silicon area of IC is memory ● Take care to energy ( static ) ! L2 L1 L1 L2 L2 L1 L1 L2 L3 L2 L1 L1 L2 L1 L2 L1 L2 4

Technology evolution Actual memories: ● SRAM for fast access ● DRAM for applications ● Flash (mass storage) ● … Emerging memories Universal memory: ● Magnetic tunneling junctions “Non-volatile memory” ● Phase change memory • SRAM performance ● Programmable metallization cells • Size of DRAM/Flash • Non-volatility ● OxRRAM • Scalibility ● … Resistance Switching Memory Emerging memories offer non-volatility, speed and endurance => disruption of the memory hierarchy? 5

Spin Technology Conductance of magnetic metal plates is larger in the presence of a magnetic field perpendicular to the current flow William Thomson 1824-1907 Currently known as Anisotropic Magnetoresistance (AMR) Resistance variation attained: 2%-5% in RT 6

Spin Technology Peter Grünberg and Albert Fert 2007 Nobel Prize in Physics ¢ Thin stacks of FM/NM metals have seen a conductance increase of up to 100% when subjected to a magnetic field B. Guinasch et al., 1989 M. N. Baibich et al., 1988 7

Spin Technology . Magnetoresistance Tunnel : (( − *+, " ) ! " = !$ + ∆ ! . " = 3 → ! = !$ " = (53 → ! = !$ + ∆ ! = !2$ ∆ ! ! 2$ − !$ TMR Classique between 150% et 250% ( or /0! % = = ! $ ! $ more ) M. Bowen et al. Nearly total spin polarization… 8

Spin Technology Compatible with CMOS Non - volatile memory Switching time < 1ns writing current < 10uA-100uA density x4 vs SRAM Immune to radiations Samsung demonstrator ( 8 Mbit STT - MRAM ) - 2016 9

Motivations • A way MRAM PCRAM • Go towards non-volatile systems using emerging NVMs FeRAM ReRAM • Current NVMs issues : Speed, Dynamic energy, Reliability … NV Cache Cache Embedded On-chip GPU GPU Non-Volatile MRAM SRAM CPU CPU High performance bus High performance bus Non-volatile Flash Memory DDR DDR eFPGA FPGA Controller Controller Controller Controller External Flash External DRAM External MRAM External MRAM Where and how to place MRAM to: reduce total power consumption ? keep same or get better performance ? 10

Contributions 1. Evaluation of MRAM-based cache memory hierarchy: • Exploration flow and extraction of memory activity • L1 and L2 caches based on STT-MRAM and TAS- MRAM 2. Non-volatile computing • Instant-on/off capability for embedded processor • Analysis and validation of Rollback mechanism 11

MRAM applied to cache NV Cache On-Chip Possible studies SRAM CPU High performance bus Performance comparison New architectures DDR 3D-Stacking Controller capability of ENERG SPEED Y MRAM External DRAM NV Memory Logic layer Non-volatile Cache AREA POTENTIAL APPLICATIONS ? Cache SRAM vs MRAM ¢ fast Hybrid (SRAM) Benefits of MRAM ¢ SRAM/MRAM CPU Low leakage Cache cache slow High density (MRAM) Non-volatility Take advantages of MRAM Mitigate drawbacks of MRAM Low leakage High write latency High density High write energy Non-volatility 12

MRAM applied to cache NVM exploration flow Benchmarks NVM memory array 1. Define the architecture gem5* Modeling (NVSim**, SPICE…) Memory Prototype Latency Full-system simulator § Single/Multi-core 2. Explore MRAM-based cache configurations Architecture level Circuit level § L1, L2, L3, Hybrid… Execution time # Reads / Writes 3. Extract many useful information # Hits / Misses § Runtime, cache energy, cache transactions… Access energy Total L1/L2 energy consumption Static power * N. Binkert et al., “ The gem5 simulator ,” ACM SIGARCH Computer Architecture News, Aug. 2011. ** X. Dong et al., “ NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory ,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Jul. 2012 . 13

MRAM applied to cache Experimental setup From single to multi-core architecture ARMv7 ISA Private L1 instruction/Data Shared L2 (Additional levels of caches possible) Main Memory Core Core Core … 1 2 N L1 I/D L1 I/D L1 I/D Shared L2 45nm 130nm 120nm 45nm DDR3 SRAM SRAM STT-MRAM TAS-MRAM (Baseline) (Baseline) 14 14

MRAM applied to cache Circuit-level analysis: Models (NVSim) & Prototype Area SRAM STT-MRAM 512kB L2 32kB L1 100 Node Technology (mm²) (mm²) SRAM 1.36 0.091 10 45nm Area (mm²) STT-MRAM 0.82 0.117 1 SRAM 9.7 120nm - 0,1 TAS-MRAM 11.7 0,01 8kB 16kB 32kB 64kB 128kB 256kB 512kB 1MB 2MB 4MB ¢ MRAM is denser for large cache capacity ¢ MRAM cell size smaller than that of SRAM ¢ MRAM needs large transistors for write ¢ TAS-MRAM cache larger due to field lines 15

MRAM applied to cache Circuit-level analysis: Models (NVSim) & Prototype Read Write Standby Latency Energy Latency Energy Leakage Technolog Node y (ns) (nJ) (ns) (nJ) (mW) SRAM 4.28 0.27 2.87 0.02 320 /2.2 ≈ /14 2.1 2.5 512kB 45nm STT- L2 cache 2.61 0.28 6.25 0.05 23 MRAM /8 SRAM 5.95 1.05 4.14 0.08 82 120nm TAS- 35 4.62 35 1.96 10 MRAM STT-MRAM ≈ SRAM MRAM > SRAM MRAM << SRAM TAS-MRAM > SRAM Latency Energy Latency Energy Leakage Technolog Node (ns) (nJ) (ns) (nJ) (mW) y 32kB L1 cache SRAM 1.25 0.024 1.05 0.006 22 /7 45nm STT- 1.94 0.095 5.94 0.04 3.3 MRAM MRAM > SRAM MRAM > SRAM MRAM << SRAM 16

MRAM applied to cache Case study Quad-core architecture: ● Frequency 1GHz Core Core Core Core ● ARMv7 ISA 1 2 3 4 ● Private L1 I/D L1 I/D L1 I/D L1 I/D L1 I/D 32 kB 32 kB 32 kB 32 kB ● Shared L2 Shared L2 512 kB ● DDR3 Main memory DDR3 512 MB Benchmarks ● SPLASH-2 ╶ Mostly high performance computing ● PARSEC ╶ Animation, data mining, computer vision, media processing 17

MRAM applied to cache Architecture-level analysis: gem5 Read/Write ratio L2/L1 access ratio Number of accesses Benchmark L1 cache L2 cache ~2 billions SPLASH-2 ~26 millions (0.5 billions/CPU) ~12 billions PARSEC ~16 millions (3 billions /CPU) Static/Dynamic energy ratio L2 à 90% Static energy L1 à 80% 18

MRAM-based L2 Execution time STT-MRAM L2 (45 nm) TAS-MRAM L2 (130 nm) 1,4 Normalized execution time SRAM 1,2 Baseline 1 0,8 0,6 0,4 0,2 0 e e s m t 1 2 1 2 x s t e r k 4 f e e e n n i g e c 6 g f u u t m d r t a a a n l l a a l r s a 2 a o e m e e r u r r r f x r e h t e a f c c i l y c c b v n v o o d m a s a a o k d a b c i e a u l r l t b f s Observations: barnes ocean2 ● STT shows good performance 100 Cache miss rate (%) ╶ L2 has small impact in overall performance 80 ● For TAS, 14% of penalty in average (SPLASH-2) 60 ╶ 40 Depends on applications (Cache miss rate, L1/L2 access ratio) 20 0 Execution time 19 19

MRAM-based L2 Total L2 cache energy consumption STT-MRAM L2 (45 nm) TAS-MRAM L2 (130 nm) SRAM Baseline 1,0 0,8 Normalized L2 energy 0,6 0,4 0,2 0,0 s m t 1 2 e s e k e 1 2 x t r 4 f e e e e c f u u n n i g t 6 g m d r a t n l l a l a a a a o r s 2 a m e r r f e e r u r h t x a r e e f i y c c c l b n c v d v o o s a m a a o k d b c a i a u e l l r b f t s fluidanimate (read) fluidanimate (write) Observations: radix (read) radix (write) (GigaBytes/s) Bandwidth 1,6 ● Up to 90% of gain for STT End of fluidanimate 1,2 End of radix ● From 40% to 90% for TAS 0,8 0,4 ╶ Due to the very low leakage of MRAM-based cache 0 Execution time 20 20

Normally off computing for smart systems Cache and main memory - PowerPoint PPT Presentation

Normally off computing for smart systems Cache and main memory architecture based on MRAM: application to High Performance Computing and Embedded Systems Lionel Torres Univ . Montpellier, France Merci : G . Sassatelli, A . Gamatie,

SMART ENERGY SMART ASSET SMART SMART SMART & CUSTOMER ASSET PURPOSE PEOPLE

Smart and Adaptive Cyber-Physical Systems Chapters 1,2 Cyber-Physical Systems Smart mobility

Normally-Off Computing for Smart City Applications Hiroshi Nakamura Graduate School of

http://fastmap.com/consent-guide Free Download Code: CON103 (normally 50)

Quality of Life - Smart Mobility - Smart Infrastructure - Smart People, Smart Living ARC 590

CONTENTS Smart Schools Bond Act Committees and the Smart Schools Investment Plan Smart Schools

Packet-Level Signatures for Smart Home Devices Rahmadi Trimananda, Janus Varmarken, Athina

between Cyber-Physical Systems (CPS) and Smart Systems and Smart System Integration paradigms,

Sustainability and Smart Grid Implementing a Non residential Smart Metering System PaperCon

Smart Metering Smart Metering The Power of Smart Metering The Power of Smart Metering MOST

Smart Solutions for a Smart Smart Solutions for a Smart Grid Grid Eric Sortomme Eric Sortomme

Government 2.0: Government 2.0: Smart Government Smart People Smart Government Smart People Did

SMART STAFFORDSHIRE Enabling citizens and businesses to flourish in the digital age SMART SMART

The Role of Smart Lighting in the Smart Grid St Stewart Findlater t Fi dl t VP, Engineering

Smart Cards Smart Cards a(s) a(s) Safety Critical Systems Safety Critical Systems Gemplus

Moving Up & Moving Off Off Campus Living 2019 Opening Video Part 1: Deciding to go off

Strategies to Provide Primary Care in an Enhanced Medical Home Model to Underserved Children

How I treat high risk myeloproliferative neoplasms Francesco Passamonti Universit

Burning Issues The Next Generation Burning Issues TNG Webcasts will identify critical

and Analytics for Extreme Weather Study using Dust Storm Event as an Example Manzhu Yu NSF

(Today) Warmup: A Taste for Discrete Math and Computing Foundations of Computer Science Lecture 1

Referent Power Refers to the nature and strength of a relationship between the power holder and

Real-World applications of Boosting Yoav Freund UCSD Practical Advantages of AdaBoost

Office Hours: COVID-19 Planning and Response August 28, 2020 Housekeeping A recording of

Normally off computing for smart systems Cache and main memory - PowerPoint PPT Presentation

Normally off computing for smart systems Cache and main memory architecture based on MRAM: application to High Performance Computing and Embedded Systems Lionel Torres Univ . Montpellier, France Merci : G . Sassatelli, A . Gamatie,

SMART ENERGY SMART ASSET SMART SMART SMART &amp; CUSTOMER ASSET PURPOSE PEOPLE

Smart and Adaptive Cyber-Physical Systems Chapters 1,2 Cyber-Physical Systems Smart mobility

Normally-Off Computing for Smart City Applications Hiroshi Nakamura Graduate School of

http://fastmap.com/consent-guide Free Download Code: CON103 (normally 50)

Quality of Life - Smart Mobility - Smart Infrastructure - Smart People, Smart Living ARC 590

CONTENTS Smart Schools Bond Act Committees and the Smart Schools Investment Plan Smart Schools

Packet-Level Signatures for Smart Home Devices Rahmadi Trimananda, Janus Varmarken, Athina

between Cyber-Physical Systems (CPS) and Smart Systems and Smart System Integration paradigms,

Sustainability and Smart Grid Implementing a Non residential Smart Metering System PaperCon

Smart Metering Smart Metering The Power of Smart Metering The Power of Smart Metering MOST

Smart Solutions for a Smart Smart Solutions for a Smart Grid Grid Eric Sortomme Eric Sortomme

Government 2.0: Government 2.0: Smart Government Smart People Smart Government Smart People Did

SMART STAFFORDSHIRE Enabling citizens and businesses to flourish in the digital age SMART SMART

The Role of Smart Lighting in the Smart Grid St Stewart Findlater t Fi dl t VP, Engineering

Smart Cards Smart Cards a(s) a(s) Safety Critical Systems Safety Critical Systems Gemplus

Moving Up &amp; Moving Off Off Campus Living 2019 Opening Video Part 1: Deciding to go off

Strategies to Provide Primary Care in an Enhanced Medical Home Model to Underserved Children

How I treat high risk myeloproliferative neoplasms Francesco Passamonti Universit

Burning Issues The Next Generation Burning Issues TNG Webcasts will identify critical

and Analytics for Extreme Weather Study using Dust Storm Event as an Example Manzhu Yu NSF

(Today) Warmup: A Taste for Discrete Math and Computing Foundations of Computer Science Lecture 1

Referent Power Refers to the nature and strength of a relationship between the power holder and

Real-World applications of Boosting Yoav Freund UCSD Practical Advantages of AdaBoost

Office Hours: COVID-19 Planning and Response August 28, 2020 Housekeeping A recording of

SMART ENERGY SMART ASSET SMART SMART SMART & CUSTOMER ASSET PURPOSE PEOPLE

Moving Up & Moving Off Off Campus Living 2019 Opening Video Part 1: Deciding to go off