Expanding t the W World o of H Heterogenous Mem emory H y Hier - - PowerPoint PPT Presentation

expanding t the w world o of h heterogenous mem emory h y
SMART_READER_LITE
LIVE PREVIEW

Expanding t the W World o of H Heterogenous Mem emory H y Hier - - PowerPoint PPT Presentation

Expanding t the W World o of H Heterogenous Mem emory H y Hier erarchies The Evolving Non-Volatile Memory Story Bill Gervasi Principal Systems Architect 16 May 2019 2 Data Memory Checkpointing Processing Tiers Challenges


slide-1
SLIDE 1

Expanding t the W World o

  • f H

Heterogenous Mem emory H y Hier erarchies The Evolving Non-Volatile Memory Story

16 May 2019 Bill Gervasi Principal Systems Architect
slide-2
SLIDE 2 2

Data Processing Challenges Checkpointing Memory Tiers Persistence Current Solutions Agenda Seeking the Ideal A New Standard Mixed Mode Solutions Distributed Processing Security Sharing Time

slide-3
SLIDE 3 3

Data processing is great

slide-4
SLIDE 4 4

Data processing is great Until something goes wrong

slide-5
SLIDE 5 5

The Cost of Power Failure

slide-6
SLIDE 6 6
slide-7
SLIDE 7 7 Run DRAM Storage Checkpoint Run Storage Checkpoint Run Storage Checkpoint

Checkpointing degrades performance Checkpointing burns power Checkpointing sucks

slide-8
SLIDE 8 8 Run DRAM Storage Checkpoint Run

FAIL!

Run Checkpoint

RESTART

But checkpointing avoids data loss from failure

slide-9
SLIDE 9 9

Data persistence is essential System failure is a key factor in server software design Storage access time impacts transaction granularity

slide-10
SLIDE 10 10

The game we play to trade off performance, capacity, and cost

slide-11
SLIDE 11 11

…move non-volatile storage closer to the CPU To reduce the penalties from checkpointing…

slide-12
SLIDE 12 12

Traditional Server Architecture Review

CPU I/O Memory Control

Memory Memory Memory Memory … Network …

$

… Memory Memory … Memory Memory …

Faster, lower latency

slide-13
SLIDE 13 13 The Search for

The holy Grail

slide-14
SLIDE 14 14

DATA PERSISTENCE

When we no longer fear power failure…

slide-15
SLIDE 15 15

What if you could replace DRAM with a non-volatile memory? You’d call it Memory Class Storage

slide-16
SLIDE 16 16

The non-volatile memory revolution is under way 3DXP ReRAM PCM MRAM

NRA RAM™

When was the last time you read about a new volatile memory?

slide-17
SLIDE 17 17

From vacuum tubes To core memory To DRAM To NVRAM

slide-18
SLIDE 18 18

THIS is why the term “Persistent Memory” is insufficient The industry must distinguish between deterministic and non-deterministic persistent memory Only “Memory Class Storage” is fully deterministic AND persistent

slide-19
SLIDE 19 19

Not all “persistence” is created equal SRAM DRAM Flash 3DXpoint NRAM FeRAM MRAM ReRAM

slide-20
SLIDE 20 20

“Write endurance” determines HOW persistent Wear leveling needed if writes are limited

slide-21
SLIDE 21 21

Temperature sensitivity impacts long term retention

Weeks of Data Retention
slide-22
SLIDE 22 22 READ WRITE WRITE READ WRITE DATA DATA DATA DATA DATA

DRAM interface is deterministic Data latency is FIXED

READ WRITE WRITE READ WRITE DATA DATA DATA HOUSEKEEPING

Any endurance limit breaks determinism

X X

slide-23
SLIDE 23 23

Full DRAM Speed No endurance limits Fully deterministic

Memory Class Storage

slide-24
SLIDE 24 24

NVRAM

is a

Memory Class Storage

slide-25
SLIDE 25 25

Memory Class Storage NVRAM =

For now…

NVRAM

Memory Class Storage

In the future?

slide-26
SLIDE 26 26

Storage Class Memory

Is NOT a

Memory Class Storage

slide-27
SLIDE 27 27

Flash Storage

Magnetic RAM Resistive RAM 3DXpoint Phase Change 3D NOR

Storage Class Memory

DDR NVRAM

≥ DRAM performance = DRAM endurance ≥ DRAM capacity

Memory Class Storage

Hard Disk SSD NVMe DDR DRAM Wasteland

slide-28
SLIDE 28 28

Deterministic Non-Deterministic Deterministic Non-Deterministic Deterministic Non-Deterministic

slide-29
SLIDE 29 29

DRAM NVDIMM-N Optane NVRAM Memory Class Storage NVDIMM-P

slide-30
SLIDE 30 30

DRAM

ACT RD WR PRE ACT RD WR PRE ACT RD WR PRE REFRESH ACT RD WR PRE

Refresh time consumes up to 15% of bandwidth Itty bitty leaky capacitors lose charge

On power fail, you lose

slide-31
SLIDE 31 31

Run

FAIL!

DRAM

slide-32
SLIDE 32 32

NVDIMM-N

DRAM Array Flash Backup NVM Control Isolation Buffers Voltage Regulator Voltage Regulator

Host System

CPU

Energy Source

Power Fail

NVDIMM-N Use DRAM normally On Power Fail, copy to Flash Power restored, copy to DRAM

slide-33
SLIDE 33 33

Run

FAIL!

NVDIMM-N

Run

Switch to Battery Power Copy DRAM to Flash Copy Flash to DRAM

RESTORE

slide-34
SLIDE 34 34

NVDIMM-N

Copy DRAM to Flash Copy Flash to DRAM 1-2 MINUTES 1-2 MINUTES

One power fail cycle pays for a LOT of protection

slide-35
SLIDE 35 35

Optane

3DXpoint Array NVM Control

Host System

CPU RD Data

Reads are slow

WR Data

Writes are deathly slow Could be used as a very slow DRAM but more common as expansion

Faster than Flash!!! But vs DRAM? Meh Decent capacity, though

slide-36
SLIDE 36 36 3DXpoint Array NVM Control

Host System

CPU

App Direct

3DXpoint Array NVM Control

Host System

CPU DRAM as Cache

Memory Mode

512GB = 512GB 512GB + 64GB = 512GB

Optane

slide-37
SLIDE 37 37 Voltage Regulator

NVDIMM-P

DRAM Cache Non-Volatile Memory Array – Any Kind NVM Control

Host System

CPU Small Energy Source Read A RSP Data A Read B Read C Send Data C Data B RSP Send RSP Send

New non-deterministic protocol Not backward compatible with DDR Requires NVDIMM-P aware CPU NVDIMM-P Protocol

slide-38
SLIDE 38 38

NVDIMM-P Persistence Options

Volatile Mode No Persistence Explicit FLUSH Command Battery Backup ala NVDIMM-N Reduced Energy, Cacheless

slide-39
SLIDE 39 39

DRAM speed Non-volatility Scalable beyond DRAM Low power Low cost Unlimited write endurance Wide temperature range Flexible fabrication & application

NVRAM

slide-40
SLIDE 40 40

Host System Drop in replacement for DRAM Permanently persistent Always available

DRAM NVRAM Memory Class Storage

Fully Deterministic

slide-41
SLIDE 41 41

DDR5 NVRAM

NRAM™ ReRAM * MRAM * PCM *

* Future generation devices
slide-42
SLIDE 42 42
slide-43
SLIDE 43 43

Compari ring DRAM AM & NVRA NVRAM

No refresh is required “Self refresh” can be power OFF Some timing differences (but deterministic!) Data persistence definitions Greater per-die capacity

slide-44
SLIDE 44 44

NRAM™ ReRAM MRAM PCM

Timings Precharge requirement Persistence definition DDR5 NVRAM Specification brings coherence

slide-45
SLIDE 45 45

IDLE REFRESH

DRAM

“350 ns”

IDLE REFRESH = NOP

NVRAM

Refresh command is not needed Decoded as NOP for compatibility

“0 ns”
slide-46
SLIDE 46 46

IDLE SELF REFRESH

DRAM

REFRESH FREQUENCY CHANGE Power burned IDLE

NVRAM

FREQUENCY CHANGE SELF REFRESH “No” power burned

slide-47
SLIDE 47 47

IDLE ACTIVATE PRECHARGE WRITE READ

DRAM

IDLE READ WRITE

NVRAM

Precharge command is not needed Decoded as NOP for compatibility

slide-48
SLIDE 48 48

Persistence Definitions*

Intrinsic: Immediately After WRITE Extrinsic: After FLUSH Command Power Fail: On NVRAM RESET

* Discussions on-going
slide-49
SLIDE 49 49 * Discussions on-going

WR WR WR

Data is persistent

Intrinsic Persistence WR WR FLUSH WR WR FLUSH Extrinsic Persistence WR WR WR WR WR RESET Power Fail Persistence

slide-50
SLIDE 50 50

DDR5 DRAM is limited to 32Gb per die

DDR5 NVRAM enables up to 128Tb per die

slide-51
SLIDE 51 51

ACT RD WR ACT RD WR ACT RD WR DDR5 SDRAM REXT ACT RD WR ACT RD WR REXT ACT RD WR DDR5 NVRAM Row Extension adds up to 12 more bits of addressing Backward compatible with DDR5 – Acts like REXT = 0 until needed

slide-52
SLIDE 52 52

Bank buffer 0 ROW COLUMNS Bank buffer 31 ROW

DDR5 SDRAM Bank buffer 0 ROW COLUMNS Bank buffer 31 REXT ROW REXT

DDR5 NVRAM

“ROW” includes bank group & bank…
slide-53
SLIDE 53 53 REXT A ACT BANK W, ROW K ACT BANK X ,ROW L ACT BANK Y, ROW M REXT B ACT BANK Z, ROW N READ BANK W READ BANK X READ BANK Y READ BANK Z Row A + K Row A + L Row A + M Row B + N REXT C READ BANK Y Row A + M ACT BANK W, ROW P READ BANK W Row C + P WRITE BANK X Row A + L ACT BANK X, ROW L WRITE BANK X Row C + L READ BANK Z Row B + N REXT A ACT BANK X, ROW R READ BANK X Row A+R      

Row Extension Example

slide-54
SLIDE 54 54 REXT A ACT BANK W, ROW K ACT BANK X ,ROW L ACT BANK Y, ROW M REXT B ACT BANK Z, ROW N READ BANK W READ BANK X READ BANK Y READ BANK Z Row A + K Row A + L Row A + M Row B + N REXT C READ BANK Y Row A + M ACT BANK W, ROW P READ BANK W Row C + P WRITE BANK X Row A + L ACT BANK X, ROW L WRITE BANK X Row C + L READ BANK Z Row B + N REXT A ACT BANK X, ROW R READ BANK X Row A+R      

Row Extension Replacement Example

slide-55
SLIDE 55 55

NVRAM Memory Class Storage

slide-56
SLIDE 56 56 Run DRAM Storage Checkpoint Run Storage Checkpoint Run Storage Checkpoint Run Run Run Run NVRAM

Checkpointing can be made to persistent memory Checkpointing can be turned

  • ff completely
OR
slide-57
SLIDE 57 57 Run DRAM Storage Checkpoint Run Storage Checkpoint Run Storage Checkpoint Run NVRAM Checkpoint Run Checkpoint Run Checkpoint NVRAM NVRAM NVRAM

Phase 1

Run NVRAM Run Run

No checkpoint No checkpoint Phase 2

slide-58
SLIDE 58 58

Keep in mind… Power failure is not the only thing to fear Checkpoints may include system failure Knowing when a task may resume is complicated

slide-59
SLIDE 59 59

Remember Those Persistence Definitions

Immediately After WRITE

Tasks may be safe in nanoseconds

After FLUSH Command

Tasks may be safe in microseconds

On NVRAM RESET

Tasks may not be safe until system stability confirmed
slide-60
SLIDE 60 60

Performance Capacity Persistence

System designers have a lot of options to balance

slide-61
SLIDE 61 61

Homogenous

Main Memory DRAM MCS Optane NVDIMM-N NVDIMM-P

slide-62
SLIDE 62 62

DRAM + Optane MCS + NVDIMM-P MCS + Optane

Heterogenous

Main Memory

slide-63
SLIDE 63 63

DRAM NVDIMM-N Optane NVRAM Memory Class Storage NVDIMM-P

32GB 64GB 512GB

When capacity meets persistence

slide-64
SLIDE 64 64

DRAM NVDIMM-N Optane MCS

Homogenous

Main Memory Combinations NVDIMM-P Data Safe No Yes Yes Yes Yes Performance Best Best Worst Mid Best+ Capacity 1.0 X 0.5 X 10 X 10 X 1 X+

slide-65
SLIDE 65 65

DRAM + Optane MCS + Optane MCS + NVDIMM-P Data Safe No Yes Yes DRAM + NVDIMM-P No Performance High High High High Capacity 6 X 6 X 6 X 6 X

Heterogeneous

Main Memory Combinations

slide-66
SLIDE 66 66

Homogenous

Main Memory Combinations Software need not care All functions take the same time

Heterogeneous

Main Memory Combinations Software encouraged to put critical functions in faster memory Often mount slower memory as RAM drive

slide-67
SLIDE 67 67

Software support via DAX assists in moving… from mounted drives… …to direct access mode …to RAM drive…

slide-68
SLIDE 68 68

The P Power

  • f

Zero Power

slide-69
SLIDE 69 69

Putting a Node to Sleep Operating Mode Self Refresh Mode Instant On means power must stay alive Refresh operations burn significant power

slide-70
SLIDE 70 70 33

Memory Class Storage can be turned off entirely Operating Mode Power Off

slide-71
SLIDE 71 71

DDR5 memory modules have on-DIMM voltage regulation (PMIC) DIMM power may be shut off independently

  • f system power
Data Buffers Memory Media

System Power

Module Power

PMIC Memory Module (DIMM) System Motherboard

slide-72
SLIDE 72 72 Data Buffers Memory Media

System Power

Module Power

PMIC

Data Buffers Memory Media Module Power

PMIC Multiple power management options System power off; both DIMMs off System power on & both DIMMs off System power on & DIMM1 on, DIMM2 off DIMM1 DIMM2

slide-73
SLIDE 73 73

Nantero NRAM™ My favorite NVRAM

Full presentation on Wednesday…
slide-74
SLIDE 74 74

Van der Waals energy barrier keeps CNTs apart or together Data retention >300 years @ 300 ֯C, >12,000 years @ 105 ֯C Stochastic array of hundreds nanotubes per each cell

ELECTRODE ELECTRODE
slide-75
SLIDE 75 75

5 ns balanced read/write performance No temperature sensitivity

slide-76
SLIDE 76 76

2,500 years ago 4,500 years ago 10,000 years ago NRAM Data Retention = 12,000 Years

slide-77
SLIDE 77 77

Array size tuned to the size of drivers & receivers

Drivers Receivers Z Y X NRAM LAYER I/O PHY 64 Kb tile X 256 K tiles = 16 Gb

Chip-level timing is a function of bit line flight times Replicate this “tile” as needed for device capacity Add I/O drivers to emulate any PHY needed

slide-78
SLIDE 78 78 Data Strobe Data Strobe FIFO FIFO SECDED ECC Engine 64 bits 72 bits x4/x8 Address Row Decode Column Decode Carbon Nanotube Arrays Chip ID Die Selector Bank Decode

DDR4, DDR5 NRAM

slide-79
SLIDE 79 79 DDR4/DDR5 Elimination of refresh Elimination of tFAW restrictions Elimination of bank group restrictions Elimination of power states Base throughput

Architectural improvements improve data throughput 15% or greater at the same clock frequency

15-20% Bandwidth: larger is better Elimination of inter-die delays DDR4/DDR5 NRAM
slide-80
SLIDE 80 80

NVRAM Memory Class Storage

NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM

Plugs into an RDIMM slot Appears to the CPU as DRAM Memory controller may optionally be tuned for NVRAM

slide-81
SLIDE 81 81

One less layer of marshmallows to deal with Fully deterministic Non- deterministic Persistence Persistence

slide-82
SLIDE 82 82
slide-83
SLIDE 83 83

A LEGO?

Know Your Enemy

Would you rather… Step on broken glass? Or some jacks?

slide-84
SLIDE 84 84

…about those energy stores… Batteries Supercapacitors Tantalums (etc.)

slide-85
SLIDE 85 85

Batteries Supercapacitors Tantalums (etc.) High capacity High energy density Low reliability Medium capacity Low energy density Degrade over time Low capacity Low energy density …but stable

slide-86
SLIDE 86 86

Flash or Storage Class Memory Storage Controller DRAM

Energy I/O Energy needed for backup of DRAM cache

slide-87
SLIDE 87 87

Flash or Storage Class Memory Storage Controller

NVRAM

Energy I/O

X

Eliminate need for backup energy

X

More room for storage

slide-88
SLIDE 88 88

NVRAM Changes the Math

DRAM cache limited by energy available No DRAM? Cache size dictated by cost/performance 1GB/TB

slide-89
SLIDE 89 89

…to Systems Evolution Switching gears again…

slide-90
SLIDE 90 90

Pop quiz How many CPUs in a 1980s PC?

slide-91
SLIDE 91 91

One?

Graphics Adapter Modem Network Adapter Sound Blaster

slide-92
SLIDE 92 92

They were called “DSPs” Digital Signal Processors They put processing next to the data They were killed by “Native Signal Processing”

Drivers

Analog front end devices

slide-93
SLIDE 93 93

$ $

$ W WW

With NSP… So why do it?

slide-94
SLIDE 94 94 CPU CPU Memory Memory Storage Storage AI FPGA Fabric

Now We Are Trending Back

slide-95
SLIDE 95 95
slide-96
SLIDE 96 96 Processor Elements Processor Elements Processor Elements I/O Elements I/O Elements Storage Elements Storage Elements Storage Elements Bridge Bridge Bridge Low Latency Fabric
slide-97
SLIDE 97 97

Distributed resources In-memory computing Application-specific computing Artificial intelligence and deep learning Security

slide-98
SLIDE 98 98 Network Adapter

Low Latency Fabric

Artificial Intelligence Accelerator Search Engine Graphics Accelerator Human interface Standard CPU HTML processing Human interface management Memory Array Filesystem Aware Storage
slide-99
SLIDE 99 99 HBM HBM HBM HBM Exec Unit SRAM Exec Unit SRAM Exec Unit SRAM

I/O NNP Control

SIMD architectures Matrix interconnections Fast pipes still limit load/save time Challenges:

  • Model checkpointing
  • Data loss on power fail
  • Temperature sensitivity

Tbps links

Example AI accelerator

slide-100
SLIDE 100 100

I/O

Back propagation algorithms complicate things Data loss problems are amplified Checkpointing highly time and bandwidth consuming

slide-101
SLIDE 101 101

The more distributed memory gets, the harder to load and unload

MEM MEM MEM MEM MEM MEM MEM MEM MEM MEM MEM MEM
slide-102
SLIDE 102 102

NVRAM TO THE RESCUE! Replacing dynamic memory with persistent memory resolves the data loss issues

slide-103
SLIDE 103 103 Exec Unit SRAM Exec Unit SRAM Exec Unit SRAM

I/O NNP Control MCS MCS MCS MCS MCS MCS MCS MCS MCS MCS MCS MCS

Just leave the data in place as long as you want

HBM HBM HBM HBM MCS HBM MCS HBM MCS HBM MCS HBM

Replace DRAM with NVRAM Replace eRAM with NVRAM

slide-104
SLIDE 104 104

SRAM & Registers

The final frontier…

slide-105
SLIDE 105 105

Continuing to look for ways to bring Memory Class Storage down under 1ns

It will happen

Faster edge rates Voltage adjustment Better error check Shadow registers Getting smarter

slide-106
SLIDE 106 106

DATA PERSISTENCE

When we no longer fear power failure…

Full END TO END persistence

slide-107
SLIDE 107 107

Are we getting near the day when we look back at volatile memory…

…and LAUGH?

slide-108
SLIDE 108 108

…b …bu …but …but…

slide-109
SLIDE 109 109

Persistent data introduces challenges, too

slide-110
SLIDE 110 110

Data is ALWAYS there! Data security is a growing concern

slide-111
SLIDE 111 111

So many potential breaches

Application opens data from previous application Memory moved from one system to another Spy devices on memory buses

slide-112
SLIDE 112 112

Infection via hack Infection via spy devices

slide-113
SLIDE 113 113

Password: X2.Hd44**3#jj0%

General trend is to encrypt data before transmission or storage

slide-114
SLIDE 114 114

Keep the bad guys out

X2.Hd44**3#jj0% X2.Hd44**3#jj0%
slide-115
SLIDE 115 115 Voltage Regulator DRAM Cache Non-Volatile Memory Array – Any Kind Smart NVM Control

Host System

CPU Small Energy Source

Some are adding in-memory compute functions including encryption Works as long as the bus is secure Encryption quality may be limited by block transfer size Management of many keys can get complicated quickly

Password: X2.Hd44**3#jj0%
slide-116
SLIDE 116 116

ISO/IEC 11889

slide-117
SLIDE 117 117

Power Fail Sucks Saving Data is a Pain Need tiers

  • f memory

& storage Persistence is Essential Today’s Solutions Help Summary But We Can Do Better DDR5 NVRAM Spec in Progress Mix & Match Memories Data Distribution Challenges Persistence Complications Sharing Time

slide-118
SLIDE 118 118

Thank you for your time Bill Gervasi bilge@Nantero.com

slide-119
SLIDE 119 119

I’m here to learn too What do you deal with?