ADVANCED MEMORY SYSTEMS Mahdi Nazm Bojnordi Assistant Professor - - PowerPoint PPT Presentation

advanced memory systems
SMART_READER_LITE
LIVE PREVIEW

ADVANCED MEMORY SYSTEMS Mahdi Nazm Bojnordi Assistant Professor - - PowerPoint PPT Presentation

ADVANCED MEMORY SYSTEMS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 5 will be released tonight (the last one J ) This lecture


slide-1
SLIDE 1

ADVANCED MEMORY SYSTEMS

CS/ECE 6810: Computer Architecture

Mahdi Nazm Bojnordi

Assistant Professor School of Computing University of Utah

slide-2
SLIDE 2

Overview

¨ Announcement

¤ Homework 5 will be released tonight (the last one J)

¨ This lecture

¤ Memory addressing/scheduling ¤ DRAM refresh ¤ Emerging technologies

slide-3
SLIDE 3

Recall: DRAM Control Tasks

¨ Refresh management ¤ Periodically replenish the DRAM cells (burst vs. distributed) ¨ Address mapping ¤ Distribute the requests to destination banks (load balancing) ¨ Request scheduling ¤ Generate a sequence of commands for memory requests

n Reduce overheads by eliminating unnecessary commands

¨ Power management ¤ Keep the power consumption under a cap ¨ Error detection/correction ¤ Detect and recover corrupted data

slide-4
SLIDE 4

Address Mapping

¨ A memory request ¨ Address is used to find the location in memory

¤ Channel, rank, bank, row, and column IDs

¨ Example physical address format ¨ A 4GB channel, 2 ranks, 4 banks/rank, 8KB page

Address Type Data Row ID Channel ID Rank ID Bank ID Column ID

slide-5
SLIDE 5

Address Mapping

¨ A memory request ¨ Address is used to find the location in memory

¤ Channel, rank, bank, row, and column IDs

¨ Example physical address format ¨ A 4GB channel, 2 ranks, 4 banks/rank, 8KB page

Address Type Data Row ID Channel ID Rank ID Bank ID Column ID

16 0 1 2 13

slide-6
SLIDE 6

Example Problem

¨ Start with empty row buffers, find the total number

  • f commands if all the request are served in order

n Address= row(12):channel(0):rank(1):bank(3):column(16)

00000010

addr

20000001 40000100 60000010 40000101

slide-7
SLIDE 7

Example Problem

¨ Start with empty row buffers, find the total number

  • f commands if all the request are served in order

n Address= row(12):channel(0):rank(1):bank(3):column(16)

00000010

addr

000 0010

rank bank row column

20000001 40000100 60000010 40000101

slide-8
SLIDE 8

Example Problem

¨ Start with empty row buffers, find the total number

  • f commands if all the request are served in order

n Address= row(12):channel(0):rank(1):bank(3):column(16)

00000010

addr

000 0010

rank bank row column

20000001 40000100 60000010 40000101 200 0001 400 0100 600 0010 400 0101

slide-9
SLIDE 9

Example Problem

¨ Start with empty row buffers, find the total number

  • f commands if all the request are served in order

n Address= row(12):channel(0):rank(1):bank(3):column(16)

00000010

addr

000 0010

rank bank row column

20000001 40000100 60000010 40000101 200 0001 400 0100 600 0010 400 0101

commands

slide-10
SLIDE 10

Example Problem

¨ Start with empty row buffers, find the total number

  • f commands if all the request are served in order

n Address= row(12):channel(0):rank(1):bank(3):column(16)

00000010

addr

000 0010

rank bank row column

20000001 40000100 60000010 40000101 200 0001 400 0100 600 0010 400 0101

commands

ACT RD PRE ACT RD PRE ACT RD PRE ACT RD PRE ACT RD

slide-11
SLIDE 11

Example Problem

¨ Find the total number of commands using the

following address mapping scheme

n Address= bank(3):rank(1):channel(0):row(12):column(16)

00000010

addr

20000001 40000100 60000010 40000101

slide-12
SLIDE 12

Example Problem

¨ Find the total number of commands using the

following address mapping scheme

n Address= bank(3):rank(1):channel(0):row(12):column(16)

00000010

addr

20000001 40000100 60000010 40000101 000 0010

rank bank row column

1 000 0001 2 000 0100 3 000 0010 2 000 0101

slide-13
SLIDE 13

Example Problem

¨ Find the total number of commands using the

following address mapping scheme

n Address= bank(3):rank(1):channel(0):row(12):column(16)

00000010

addr

20000001 40000100 60000010 40000101 000 0010

rank bank row column

1 000 0001 2 000 0100 3 000 0010 2 000 0101

commands

slide-14
SLIDE 14

Example Problem

¨ Find the total number of commands using the

following address mapping scheme

n Address= bank(3):rank(1):channel(0):row(12):column(16)

00000010

addr

20000001 40000100 60000010 40000101 000 0010

rank bank row column

1 000 0001 2 000 0100 3 000 0010 2 000 0101

commands

ACT RD ACT RD ACT RD ACT RD RD

slide-15
SLIDE 15

Command Scheduling

¨ Write buffering

¤ Writes can wait until reads are done

¨ Controller queues DRAM commands

¤ Usually into per-bank queues ¤ Allows easily reordering ops. meant for same bank

¨ Common policies

¤ First-Come-First-Served (FCFS) ¤ First-Ready First-Come-First-Served (FR-FCFS)

slide-16
SLIDE 16

Command Scheduling

¨ First-Come-First-Served

¤ Oldest request first

¨ First-Ready First-Come-First-Served

¤ Prioritize column changes over row changes ¤ Skip over older conflicting requests ¤ Find row hits (on queued requests)

n Find oldest n If no conflicts with in-progress request à good n Otherwise (if conflicts), try next oldest

slide-17
SLIDE 17

FCFS vs. FR-FCFS

¨ READ(B0,R0,C0) READ(B0,R1,C0) READ(B0,R0,C1)

¤ FCFS

slide-18
SLIDE 18

FCFS vs. FR-FCFS

¨ READ(B0,R0,C0) READ(B0,R1,C0) READ(B0,R0,C1)

¤ FCFS

Cmd Addr

ACT R0 READ C0 PRE B0 ACT R1 READ C0 PRE B1 ACT R0 READ C1

slide-19
SLIDE 19

FCFS vs. FR-FCFS

¨ READ(B0,R0,C0) READ(B0,R1,C0) READ(B0,R0,C1)

¤ FCFS ¤ FR-FCFS

Cmd Addr

ACT R0 READ C0 PRE B0 ACT R1 READ C0 PRE B1 ACT R0 READ C1

slide-20
SLIDE 20

FCFS vs. FR-FCFS

¨ READ(B0,R0,C0) READ(B0,R1,C0) READ(B0,R0,C1)

¤ FCFS ¤ FR-FCFS

Cmd Addr

ACT R0 READ C0 PRE B0 ACT R1 READ C0 PRE B1 ACT R0 READ C1

Cmd Addr

ACT R0 READ C0 READ C1 PRE B0 ACT R1 READ C0 Savings

slide-21
SLIDE 21

Row Buffer Management Policies

¨ Open-page policy

¤ After access, keep page in DRAM row buffer ¤ If access to different page, must close old one first

n Good if lots of locality ¨ Close-page policy

¤ After access, immediately close page in DRAM row

buffer

¤ If access to different page, old one already closed

n Good if no locality (random access)

slide-22
SLIDE 22

DRAM Refresh Management

¨ DRAM requires the cells’ contents to be read and

written periodically

slide-23
SLIDE 23

DRAM Refresh Management

¨ DRAM requires the cells’ contents to be read and

written periodically

¤ Burst refresh: refresh all of the cells each time

n Simple control mechanism

n time bursts

slide-24
SLIDE 24

DRAM Refresh Management

¨ DRAM requires the cells’ contents to be read and

written periodically

¤ Burst refresh: refresh all of the cells each time

n Simple control mechanism

¤ Distributed refresh: a group of cells are refreshed

n Avoid blocking memory for a long time

n time bursts m time distributed

slide-25
SLIDE 25

DRAM Refresh Management

¨ DRAM requires the cells’ contents to be read and

written periodically

¤ Burst refresh: refresh all of the cells each time

n Simple control mechanism

¤ Distributed refresh: a group of cells are refreshed

n Avoid blocking memory for a long time ¨ Recently accessed rows need not to be refreshed

¤ Smart refresh n time bursts m time distributed

slide-26
SLIDE 26

Error Detection/Correction

¨ Data in memory may be corrupted

¤ Many reasons: leakage, alpha particles, hard errors

¨ Can errors be detected?

¤ Error detection codes: additional parity bits

¨ Can errors be corrected?

¤ Error correction codes: ECC bits are added to data

¨ Single-Error Correction, Double-Error Detection

¤ Commonly used in memory systems

slide-27
SLIDE 27

ECC DIMM

¨ An additional DRAM chip is used for storing

SECDED ECC bits for error correction

8 8 8 8 8 8 8 8 8 72 Hamming Code (72,64)

slide-28
SLIDE 28

Emerging Technologies

slide-29
SLIDE 29

DRAM Cell Structure

¨ One-transistor, one-capacitor

¤ Realizing the capacitor is challenging

  • 1T-1C DRAM
  • Charge based sensing
  • Volatile
slide-30
SLIDE 30

DRAM Cell Structure

¨ One-transistor, one-capacitor

¤ Realizing the capacitor is challenging

  • 1T-1C DRAM
  • Charge based sensing
  • Volatile
slide-31
SLIDE 31

Memory Scaling in Jeopardy

Scaling of semiconductor memories greatly challenged beyond 20nm

Example: DRAM

slide-32
SLIDE 32

Memory Scaling in Jeopardy

Scaling of semiconductor memories greatly challenged beyond 20nm

Example: DRAM

A/R < 10

slide-33
SLIDE 33

Why DRAM Slow?

¨ Logic VLSI Process: optimized for better transistor

performance

¨ DRAM VLSI Process: optimized for low cost and low

leakage

PCB Logic DRAM How to reduce distance?

slide-34
SLIDE 34

3D Die-Stacking

¨ Different devices are stacked on top of each other ¨ Layers are connected by through-silicon vias (TSVs) ¨ Why? ¤ Communication between devices bottlenecked by limited

I/O pins

¤ Integrating heterogeneous elements on a single wafer is

expensive and suboptimal

PCB Logic DRAM Logic DRAM DRAM

slide-35
SLIDE 35

3D Stacked Memory

¨ Hybrid Memory Cube (HMC)

¤ A logic layer at the bottom

¨ High Bandwidth Memory (HBM)

¤ Silicon interposer at the bottom

Package Substrate Silicon Interposer DRAM Dice{ … Processor Die Interface Controller Bank In-Package Cache Controller

slide-36
SLIDE 36

Emerging Non Volatile Memory

slide-37
SLIDE 37

Resistive Memory Technologies

¨ Key concept: replace DRAM cell capacitor with a programmable

resistor

  • 1T-1C DRAM
  • Charge based sensing
  • Volatile
  • 1T-1R STT-MRAM, PCM, RRAM
  • Resistance based sensing
  • Non-volatile
slide-38
SLIDE 38

Leading Contenders

STT-MRAM PCM-RAM R-RAM

+ Multi-level cell capable + 4F2 3D-stackable cell

  • Endurance: ~109 writes
  • ~100ns switching time
  • ~300uW switching

power + Multi-level cell capable + 4F2 3D-stackable cell

  • Endurance: 106~1012

writes + ~5ns switching time + ~50uW switching power

  • Limited to single-level

cell

  • 3D un-stackable

+ High endurance (~1015) + ~4ns switching time + ~50uW switching power [ITRS’13]

[Halupka, et al. ISSCC’10] [Pronin. EETime’13] [Henderson. InfoTracks’11]

slide-39
SLIDE 39

Positioning of Resistive Memories

RRAM PCM STT SRAM DRAM FLASH HDD Lower Cost Capacity Higher Speed Higher Endurance