System-on-Chip Seung Kang Qualcomm Technologies, Inc. IEEE - - PowerPoint PPT Presentation

system on chip
SMART_READER_LITE
LIVE PREVIEW

System-on-Chip Seung Kang Qualcomm Technologies, Inc. IEEE - - PowerPoint PPT Presentation

Emerging Memories and Pathfinding for the Era of sub-10nm System-on-Chip Seung Kang Qualcomm Technologies, Inc. IEEE Solid-State Circuits Society Seminar San Diego, CA August 8, 2019 1 Memory Is Big Business >> $100 Billions*


slide-1
SLIDE 1

1

IEEE Solid-State Circuits Society Seminar San Diego, CA August 8, 2019

Emerging Memories and Pathfinding for the Era of sub-10nm System-on-Chip

Seung Kang Qualcomm Technologies, Inc.

slide-2
SLIDE 2

2

Memory Is Big Business

* Not including embedded memories for AP, SOC, and MCU

http://www.icinsights.com/news/bulletins/Total-Memory-Market-Forecast-To-Increase-10-In-2017/

>> $100 Billions*

https://www.dw.com/

slide-3
SLIDE 3

3

Memory Subsystem

Hierarchical memory layers

Computing Data Bit Cost

Remote Storage Local Storage Main Memory On-chip Cache RF Off-chip Cache

slide-4
SLIDE 4

4

Memory Subsystem

There is no such thing like a universal memory

Speed; Endurance

Flash (SSD), HDD, Tape Flash (SSD), HDD DRAM SRAM RF “Embedded” DRAM

Density; Retention

slide-5
SLIDE 5

5

Problem Statement 1

"Memory Wall"

Overall system performance & power governed more by memory subsystem than by CPU subsystem

CPU L3 Cache GPU Memory L1 RF ROM Custom SRAM

SOC, AP

OTP/MTP L2

DRAM Flash Storage/SSD HDD

CPU

ROM OTP/MTP SRAM External Flash

Embedded Standalone Computing- centric Data-centric

MCU

eFlash

Cost

slide-6
SLIDE 6

6

Problem Statement 2

Many-Core Processors

  • Datacenter applications projecting 120 Mbytes (960 Mb) L3 cache at 10nm and beyond.
  • More expensive at advanced nodes (6T-SRAM: 550 F2 at 7 nm vs. 150 F2 at 40 nm)
  • High standby/leakage power (worse at high T)

Increasing SRAM area & leakage power overhead

Intel Broadwell-E (14nm node)

Shared L3 Cache Shared L3 Cache

25 Mbytes of L3 cache (60 Mbytes for 24 cores)

slide-7
SLIDE 7

7

Problem Statement 3

  • Energy-hungry
  • Poor form factor
  • High cost
  • Security vulnerability

Inherent drawbacks caused by memory limitations IOT & Embedded System

“The IOT is an NVM problem.”

Greg Yeric, ARM (2015 IEDM Plenary Talk)

slide-8
SLIDE 8

8

A New Perspective on Energy Efficiency

Critical Challenge: Battery Life (Energy Efficiency) New Demand and Criteria for Wearable and Bioelectronic Devices

slide-9
SLIDE 9

9

A New Perspective on Security & Privacy

Endpoint Cloud Gateway

? ? ?

Demand for secure memory and HW primitives (e.g. PUF)

slide-10
SLIDE 10

10

Problems, new requirements, and opportunities demand advanced memories…

slide-11
SLIDE 11

11

Memory Classification

Device Type Volatile Memory SRAM DRAM Nonvolatile Memory Charge Modulation Flash 2D/3D NAND NOR FRAM Resistance Modulation PCM MRAM STT- MRAM SOT/SHE Field MRAM RRAM Ox-RAM CB-RAM VMCO CNT Mott Transition

Mature (mainstream or commoditized) Emerging (currently in small markets)

slide-12
SLIDE 12

12

Phase Change Memory PCM PC-RAM PRAM

slide-13
SLIDE 13

13

PCM: Early History

  • Density: 256 bits
  • Die Size: 122-by-131-mil (10.3 mm2)
  • Read: 2.5 mA, < 5 V
  • Set: 5 mA, 25 V, 10 ms
  • Reset: < 200 mA, 25 V, 5 µs

Neale, Nelson, & Moore, Electronics, 1970

“Nonvolatile and reprogrammable, the read-mostly memory is here”

slide-14
SLIDE 14

14

Amorphous High R Crystalline Low R

PCM: Basic Concept

Phase-change Element

  • Chalcogenide alloy (e.g. Ge-Sb-Te/GST))
  • Programming: Joule heating followed by

natural cooling

  • Relatively simple physics!

T > melting point T > crystallization T

Source: Samsung (2006)

slide-15
SLIDE 15

15

PCM: Cell and Array Architecture

Cell = Access Device + Phase-change Element

1FET-1R 1Diode-1R

The required characteristics of access FET, diode,

  • r BJT are largely governed by the upper limit of

the reset current (to drive localized melting) at a target cell size.

1BJT-1R Cross-bar Array

slide-16
SLIDE 16

16

PCM: Evolution of Cell Configuration Improve thermal isolation

Source: H.-L. Lung (ITRS ERD, 2014) >90% of heat is wasted during reset Lower reset current/power Improved endurance & retention

slide-17
SLIDE 17

17

PCM: Reliability Cycling Endurance

Updoped GST Doped GST Chen et al. (Macronix-IBM, IMW, 2009)

10 cycles 10K cycles 1M cycles 1K cycles 100M cycles 1B cycles 0 cycles 0 cycles

slide-18
SLIDE 18

18

PCM: Reliability Retention

Shih et al. (Macronix-IBM, IEDM, 2008)

slide-19
SLIDE 19

19

PCM: Prototype Samsung 8Gb PCM (ISSCC, 2012)

4.2F2

slide-20
SLIDE 20

20

PCM: Evolution to 3D

Kau et al. (Intel & Numonyx, IEDM, 2009)

  • 20nm node
  • 128 Gb
  • SLC

3D XPoint (Intel & Micron, 2016)

  • PCMS
  • Phase-change memory (PCM)

coupled with a selector (OTS)

  • OTS: Ovonic Threshold Switch
  • 64 Mb
  • Endurance: 106 cycles

Selector Memory Source: Intel.com Chip Density 16 GB (128 Gb) 32 GB Read Latency 7 s 9 s Write Latency 18 s 30 s Random Read 190K IOPS 240K IOPS Random Write 35K IOPS 65K IOPS Sequential Read 900 MB/s 1350 MB/s Sequential Write 145 MB/s 290 MB/s Power (Active/Idle) 3.5 W / 1 W Endurance (Lifetime Writes) 182.5 TB

Intel Optane Memory Series (2017)

slide-21
SLIDE 21

21

3D XPoint as Storage Class Memory

It does not replace DRAM, or NAND storage, but it adds a new layer to improve the subsystem

Source: Intel-Micron, 2015

slide-22
SLIDE 22

22

Magnetoresistive RAM MRAM Spin-transfer-torque MRAM STT-MRAM ST-MRAM STT-RAM

slide-23
SLIDE 23

23

A Building Block: Magnetic Tunnel Junction

Multiple flavors, but perpendicular MTJ

Electrical resistance varied by relative electron spin alignment : Magnetoresistance (MR)

Parallel Low Resistance (RP) Antiparallel High Resistance (RAP) Free Layer Pinned Layer Tunnel Barrier Relatively small read window Electrical switching, not magnetic switching

slide-24
SLIDE 24

24

MRAM Snapshot

Operation voltage on MTJ Read: 0.1 V Write: 0.3 − 0.5 V

  • Fast NVM
  • High endurance
  • 3 additional masks over baseline

logic

  • Low voltage (no charge pump)
  • Scalable

A new class of memory: Nonvolatile RAM

Lu et al. (Qualcomm & TDK) IEDM, 2015 Park et al. (Qualcomm & Applied Mat.) IEDM, 2015

slide-25
SLIDE 25

25

MRAM Array Architecture

MUX

wl<0> wl<1> wl<510> wl<511> MTJ MTJ MTJ MTJ MTJ MTJ MTJ MTJ

Data MTJ array Write Driver

SLDP (local data path)

MTJ Array MTJ Array

BL0 SL0 BL31 SL31

MTJ MTJ MTJ MTJ

Ref BL1 Ref SL1

Use the same bitcell for both data and reference array

MTJ MTJ MTJ MTJ

Rref BL0 Rref SL0 Ref MTJ array

2IOs+Ref

Reference Generator Read SA

  • Small read window → Design for robust read (sensing) is critical
  • Balancing switching asymmetry and source generation
slide-26
SLIDE 26

26

Challenges for MRAM Design and Reliability

Narrow design window for deeply scaled nodes

Prevent write error ▪ Low VWrite ▪ Fast fall off of WER slope Prevent read error ▪ Low VRead (0.1V) ▪ High TMR ▪ Fast fall off of RDR slope Improve barrier reliability ▪ High VBD ▪ Contain TDDB

slide-27
SLIDE 27

27

MRAM Device Scalability: Ic

MTJ Diameter (nm)

At small dimensions, dynamic current consumption becoming comparable with that of SRAM cell current

Kang, VLSI Symp., 2014 Saida et al., VLSI Symp., 2016

Critical Switching Current (µA)

Most important bitcell and design parameter

slide-28
SLIDE 28

28

MRAM Device Scalability: Endurance

10 years, 50% Duty Cycle

1.E+02 1.E+06 1.E+10 1.E+14 1.E+18 1.E+22

0.5 0.75 1 1.25 1.5

5.E-06 5.E-02 5.E+02 5.E+06 5.E+10 5.E+14

Cycles Breakdown (cycles) MTJ Voltage (V) Time to Breakdown (sec) (-) AP-P (+) P-AP (-) 1 ppm (+) 1 ppm 5×1014 5×1010 5×106 5×102 5×10-2 5×10-6 1022 1018 1014 1010 106 102

5000 10000 15000 20000 25000 30000

1 1.5 2

10000 20000 30000

1 1.5 2 Resistance (Ohms) MTJ Voltage (V) 45 nm 25 nm 30k 20k 10k (-) Polarity, 50 ns Pulse

1.E+08 1.E+09 1.E+10 1.E+11 1.E+12 1.E+13

25 50 75 100 Endurance Requirement Millions of accesses per core per second L2 SRAM (256 KB) L2 MRAM (1024 KB) L3 SRAM (1.5 MB) L3 MRAM (6 MB) 1013 1012 1011 1010 109 108

Intrinsically solid Better with MTJ scaling In real life, subjected to design robustness & defect control

Practically unlimited endurance for cache applications

Kan et al., IEDM, 2016

slide-29
SLIDE 29

29

MRAM: Prototypes

SK Hynix-Toshiba (IEDM, 2016 / ISSCC, 2017) Samsung (IEDM, 2016 / 7th MRAM Global Innovation Forum)

4Gb 9F2 (30nm)

slide-30
SLIDE 30

30

MRAM: Qualcomm Demo System

Integrated into a demo tablet 350X faster than Flash 3X faster than PSRAM

MRAM integrated along with PSRAM and NOR Flash for performance and power benchmarking

Kang, IMW, 2016

MRAM can unify PSRAM (volatile RAM) and NOR (nonvolatile storage) with PPAC advantages

slide-31
SLIDE 31

31

MRAM In Production

slide-32
SLIDE 32

32

MRAM for Processing-in-Memory CNN Accelerator

From Gyrfalcon Technologies (2018)

  • 22nm eMRAM (40 MB)
  • 9.9 TOPS/W

A single-chip solution for Mobile and IOT applications

slide-33
SLIDE 33

33

Resistive RAM RRAM ReRAM Conductive Bridge RAM CB-RAM

slide-34
SLIDE 34

34

RRAM: Materials

Two-terminal resistive switching elements (excluding PCM and MRAM). Found in numerous combinations of materials.

Source: P. Wong (Stanford, 2011)

slide-35
SLIDE 35

35

RRAM: Common Classification

Different materials & switching characteristics

Metal Oxide

Top Electrode

Bottom Electrode

Oxide RRAM (Ox-RAM) Transition Metal Oxide RRAM

Solid Electrolyte

Top Electrode

Bottom Electrode

Conductive Bridge RRAM (CB-RAM) Programmable Metallization Cell (PMC) Conductive Metal Oxide Top Electrode

Bottom Electrode

Tunnel Barrier Conductive Metal Oxide RRAM Vacancy Modulated Conductive Oxide RRAM (VMCO RRAM)

Filamentary Switching (1D) Interfacial Switching (2D) Uniform Switching (No forming)

Metal Ion Reservoir

slide-36
SLIDE 36

36

RRAM: Switching

Top Electrode

Bottom Electrode

Initial State (Very High R)

Metal Oxide

Top Electrode

Bottom Electrode

Top Electrode

Bottom Electrode

Top Electrode

Bottom Electrode

Forming (Low R) Reset (High R) Set (Low R)

+ + +

  • Current

Voltage Bipolar Switching Current Voltage Unipolar Switching

Kwon et al. Nature Nanotechnology (2010) Observation of a filament

slide-37
SLIDE 37

37

RRAM: Cell and Array Architecture

1T-1R 1D-1R (Diode selector for unipolar RRAM)

Sheu at al. (VLSI Symp., 2008)

  • P. Wong (Stanford)

1D-1R/1S-1R (Stacked Cross Point Array)

Lee et al. (IEDM, 2007) Yoon et al. (VLSI Symp., 2009)

3D Vertical Cross Point RRAM

slide-38
SLIDE 38

38

RRAM: Variability

Temporal and spatial variability

Resistance Variation vs. Switching Current Write Speed vs. Read Margin Jurczak (ITRS ERD, 2014)

Sills et al. (VLSI Symp., 2014)

slide-39
SLIDE 39

39

RRAM: Reliability

Endurance & Retention

Sills et al. (VLSI Symp., 2014) Wei et al. (IEDM, 2011)

256 Kbit array baked at 150oC for 1000 hours

slide-40
SLIDE 40

40

RRAM: Prototypes

SanDisk-Toshiba RRAM (ISSCC, 2013)

  • By far, largest density RRAM test chip
  • Relatively slow performance (NAND Flash alternative)

T.-Y. Liu et al. (JSSCC, 2014)

slide-41
SLIDE 41

41

RRAM: Prototype

Micron-Sony CB-RAM (ISSCC, 2014)

  • Target application: storage class memory
  • Endurance target: >106 cycles
  • Raw BER
  • Endurance <3X10-5 at 106 cycles
  • Retention: <2X10-4 at 10 years, 70oC, 104 cycles
  • Read disturb: <2X10-5 at 106 reads

Acceptable for SCM?

slide-42
SLIDE 42

42

Memristor

RRAM (and also MRAM and PCM) may show memristic behaviors (analog memory characteristics)

Nature v.453, p.80 (2008)

L.O. Chua, IEEE Trans. Circuit Theory 18, p.507 (1971)

slide-43
SLIDE 43

43

Ferroelectric Memory FRAM FeRAM Ferroelectric FET FeFET

slide-44
SLIDE 44

44

Conventional FRAM

Perovskite crystals (PZT, SBT) Internal electric dipole reversibly switchable by electric field

Pb(Zrx, Ti1-x)O3 Kim et al. (IEDM, 2005) 1T-1C (C=FeCAP) Ramtron (2012) PZT: lead zirconate titanate SBT: strontium bismuth tantalate

slide-45
SLIDE 45

45

Conventional FRAM

In production, but not scaling beyond 130nm Fundamental scaling limit requires 3D FeCAP (very challenging)

Koo et al. (IEDM, 2006)

  • U. Bottger and S.R. Summerfelt, Ferroelectric RAM,

from Nanoelectronics and Information Technology (ed. by R. Waser)

slide-46
SLIDE 46

46

FeFET

Polarization of ferroelectric layer over the Si channel modulates the threshold voltage (Vth) → 1T FRAM

On (“1”) Off (“0”)

Challenges

  • Required perovskite configuration

difficult to integrate

  • Data retention (→ FeDRAM?)
  • Depolarization field
  • Leakage

Ma (IMW, 2014)

slide-47
SLIDE 47

47

FeFET: Renewed Hope for Scaling

Orthorhombic phase of HfO2

SBT: SrBi2Ta2O9 (perovskite) High endurance at limited retention (<103 sec) Good retention at limited endurance (<105)

Muller et al. (VLSI Symp. 2012) Cheng & Chin (EDL, 2014)

slide-48
SLIDE 48

48

PCM… MRAM… RRAM… FRAM… Hype? Promise? Reality? Opportunities?

slide-49
SLIDE 49

49

Emerging Memory Reality Check

Ideally

1ns 4F2 >105h 1015 cycles Performance Density (Cell Size) Retention (Energy Barrier) 4F2 100F2 1ns 1000ns 105h 1h Endurance vs. Switching Energy Universal Memory

In Reality

There is no universal memory Opportunities in tunability (system differentiation, user experiences)

slide-50
SLIDE 50

50

Positioning Emerging Memory

Need to understand the application space

Lee, Kan, and Kang, ISLPED, 2014

High performance & good endurance Low cost, high density, and intermediate performance Low cost & long battery life (fast cycle & low leakage) Low cost & good reliability Lowest cost per bit & high density Anti-tampering & atomic operation

slide-51
SLIDE 51

51

5th CIES Forum

Emerging Memory Pathfinding for Sub-10nm CMOS

MRAM as an example because of its NV-RAM attributes and recent advances at major IC manufacturers

slide-52
SLIDE 52

52

CMOS Logic Scaling

Source: A. Steegen, 2018 ITF Belgium

Intrinsic FinFET scaling is limited Logic scaling is about standard cell architecture innovation

slide-53
SLIDE 53

53

Parasitic R & C Impact

MOL and BEOL parasitic R & C causing more delays than intrinsic transistor delay Negatively impacting essentially all types of resistance-based memory designs (MRAM, RRAM, PCM)

slide-54
SLIDE 54

54

SRAM Scaling

High-density 6T SRAM: 550F2 at 7nm Expect 1000F2 at 5nm

F: node number

Relatively more expensive at advanced nodes FinFET SRAM near the end of scaling

Kang & Park, IEDM 2017

slide-55
SLIDE 55

55

MRAM Pathfinding as an SRAM Alternative

Reduce last-level-cache area and energy consumption

A 22nm case study by Toshiba

slide-56
SLIDE 56

56

Cell Design Challenge: Supply Current

28nm 7nm

CMOS supply current much smaller at advanced nodes → Requiring low switching current and low MTJ resistance

Jc = 4.41 MA/cm2 Park et al., VLSI Symp. 2018

slide-57
SLIDE 57

57

Reliability Challenge: Endurance

Intrinsic endurance practically unlimited However, endurance sensitive to switching voltage & MgO TDDB

Need TDDB test

Common memory applications < 1012

Smaller MTJ → Higher Vbd

Kan et al., IEDM 2016 & TED 2017

slide-58
SLIDE 58

58

Cell Architecture Pathfinding for 7nm

1T-1MTJ

Bitcell (X,Y): (2PM1, 2Pfin) Area (2-fin cell): 140 F2 MRAM:SRAM → 0.25X (for area)

2 Pfin 2 PM1

PO SL BL

Bitcell (X,Y): (2CPP, 3Pfin) Area (6-fin cell): 210 F2 MRAM:SRAM → 0.35X (for performance)

3 Pfin 2 CPP

PO SL BL

MTJ pitch → 85-90 nm MTJ CD → 30-35 nm

2T-1MTJ

slide-59
SLIDE 59

59

Prospect

28nm 22nm 7nm 5nm

Last-level Cache High-density SRAM High-BW RAM eNVM (code & data)

IOT Wearables Security Automotive Mobile AP ML/AI Datacenter In Production

Research

Kang, 2014 VLSI Symp. & 2019 CIES Tech Forum

slide-60
SLIDE 60

60

From Research to Commercialization

Groundbreaking Paper Discovery in physics, materials science Laboratory Demonstration Functional device Circuit Prototyping Functional array IP Design Technology and IP qualification (fully functional and reliable

  • ver P-V-T)

Product Design / System Integration Product pathfinding & qualification Pilot Production / Early Adopter Early market Volume Production Facing “the Chasm”

Any fundamental showstopper?

Semiconductor devices typically require >10 years of R&D (e.g. FinFET)

Can you stay in the game?

slide-61
SLIDE 61

61

Thank You. For questions and feedbacks, contact kang@qti.qualcomm.com