DRAM REFRESH MANAGEMENT Mahdi Nazm Bojnordi Assistant Professor - - PowerPoint PPT Presentation

dram refresh management
SMART_READER_LITE
LIVE PREVIEW

DRAM REFRESH MANAGEMENT Mahdi Nazm Bojnordi Assistant Professor - - PowerPoint PPT Presentation

DRAM REFRESH MANAGEMENT Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 7810: Advanced Computer Architecture Overview Upcoming deadline Tonight: homework assignment will be posted This lecture


slide-1
SLIDE 1

DRAM REFRESH MANAGEMENT

CS/ECE 7810: Advanced Computer Architecture

Mahdi Nazm Bojnordi

Assistant Professor School of Computing University of Utah

slide-2
SLIDE 2

Overview

¨ Upcoming deadline

¤ Tonight: homework assignment will be posted

¨ This lecture

¤ DRAM address mapping ¤ DRAM refresh basics ¤ Smart refresh ¤ Elastic refresh ¤ Avoiding or pausing refreshes

slide-3
SLIDE 3

DRAM Address Mapping

¨ Where to store cache lines in main memory?

Row Bank Column

Typical Mapping

Block

Application A:

Bank

DRAM Banks

Good distribution of memory requests among DRAM banks.

slide-4
SLIDE 4

DRAM Address Mapping

¨ Where to store cache lines in main memory?

Row Bank Column

Typical Mapping

Block Bank

DRAM Banks

Application B: Unbalanced distribution of memory requests among DRAM banks.

slide-5
SLIDE 5

DRAM Address Mapping

¨ How to compute bank ID?

Row Row Column

Custom Mapping

Block

Application B:

Bank

DRAM Banks

Good distribution of memory requests among DRAM banks.

slide-6
SLIDE 6

Bank 0 Bank 1 Bank 2 Bank 3

Address format

page index page offset page offset bank r p-b b k

cacheline 0 cacheline 4 … cacheline 1 cacheline 5 … cacheline 2 cacheline 6 … cacheline 3 cacheline 7 … Spatial locality is not well preserved!

Cache Line Interleaving

slide-7
SLIDE 7

Page Interleaving

Page 0 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 … … … … Bank 0

Address format

Bank 1 Bank 2 Bank 3

page index page offset bank r p k

slide-8
SLIDE 8

Cache Line Mapping

¨ Bank index is a subset of set index

page index page offset page offset cache tag cache set index block offset bank page index page offset t s b bank r r p-b b k p k

Cache-related representation Cache line interleaving Page interleaving

slide-9
SLIDE 9

Row Buffer Conflict

¨ Problem: interleaving load and writeback streams

with the same access pattern to the banks may result in row buffer misses

x Load y Writeback x+b y+b x+2b y+2b x+3b … x x+b x+2b x+3b the same row buffer

slide-10
SLIDE 10

Key Issues

¨ To exploit spatial locality, use maximal interleaving granularity

(or row-buffer size)

¨ To reduce row buffer conflicts, use only those bits in cache set

index for “bank bits”

page index page offset bank r p k cache tag cache set index block offset t s b

slide-11
SLIDE 11

Permutation-based Interleaving

k

XOR

k

page index page offset new bank

k

page offset index bank L2 Cache tag [Zhang‘00]

slide-12
SLIDE 12

Permutation-based Interleaving

¨ New bank index

[Zhang‘00]

memory banks

0000 0001 0010 0011 0100 0101 0110 0111 1010 1011

Permutation-base interleaving

1011 1010 1010 1001 1000 1010 1010 1010

L2 Conflicting addresses

xor

Different bank indexes Conventional interleaving Same bank indexes

slide-13
SLIDE 13

Permutation-based Interleaving

[Zhang‘00]

60% 80% 100% 120% 140% 160% 180% t

  • m

c a t v s w i m s u 2 c

  • r

h y d r

  • 2

d m g r i d a p p l u t u r b 3 d w a v e 5 T P C

  • C

IPC cacheline page swap permutation

slide-14
SLIDE 14

DRAM Refresh

¨ DRAM cells lose charge over time ¨ Periodic refresh operations are required to avoid

data loss

¨ Two main strategies for refreshing DRAM cells

¤ Burst refresh: refresh all of the cells each time

n Simple control mechanism (e.g., LPDDRx)

¤ Distributed refresh: a group of cells are refreshed

n Avoid blocking memory for a long time

n time bursts m time distributed

slide-15
SLIDE 15

Refresh Basics

¨ tRET: the retention time of DRAM leaky cells (64ms) ¤ All cells must be refreshed within tRET to avoid data loss ¨ tREFI: refresh interval, which is the gap between two refresh

commands issues by the memory controller

¤ MC sends 8192 auto-refresh commands to refresh one bin at a

time

n tREFI = tRET/8192 = 7.8us ¨ tRFC: the time to finish refreshing a bin (refresh completion) ¨ What is the bin size?

slide-16
SLIDE 16

Refresh Basics

¨ tRFC increases with chip capacity

100 200 300 400 500 600 700 1 2 4 8 16 32 tRFC (ns) Chip Size (Gb)

Impact of chip density on refresh completion time

[Stuecheli’10]

slide-17
SLIDE 17

Controlling Refresh Operations

¨ CAS before RAS (CBR) ¤ DRAM memory keeps track of the addresses using an

internal counter

¨ RAS only refresh (ROR) ¤ Row address is specified by the controller; similar to a pair

  • f activate and precharge

¨ Auto-refresh vs. self refresh ¤ Every 7.8us a REF command is sent to DRAM (tRAS+tRP) ¤ LPDDR turns off IO for saving power while refreshing

multiple rows

slide-18
SLIDE 18

Refresh Granularity

¨ All bank vs. per bank refresh

slide-19
SLIDE 19

Optimizing DRAM Refresh

¨ Observation: each row may be accessed as soon as

it is to be refreshed

Time Refresh Time for Row 0 Refresh Time for Row 1 Refresh Time for Row 2 Refresh Time for Row 3

Mem access Mem access Mem access Mem access Mem Refresh Mem Refresh Mem Refresh Mem Refresh

slide-20
SLIDE 20

Smart Refresh

¨ Idea: avoid refreshing recently accessed rows

[Ghosh‘07]

slide-21
SLIDE 21

21 Laboratory for Computer Architecture 12/7/2010

Diverse Impacts of Refresh

Refresh 26ns 326ns Worst Case Refresh Hit DRAM Read DRAM capacity tRFC bandwidth

  • verhead

(95oC per Rank) latency

  • verhead

(95oC)

512Mb 90ns 2.7% 1.4ns 1Gb 110ns 3.3% 2.1ns 2Gb 160ns 5.0% 4.9ns 4Gb 300ns 7.7% 11.5ns 8Gb 350ns 9.0% 15.7ns

Refreshes Reads tRFC tREFI

[Stuecheli’10]

slide-22
SLIDE 22

Elastic Refresh

¨ Send refreshes during periods of inactivity ¨ Non-uniform request distribution ¨ Refresh overhead just has to fit in free cycles ¨ Initially not aggressive, converges with delay until

empty (DUE) as refresh backlog grows

¨ Latency sensitive workloads are often lower

bandwidth

¨ Decrease the probability of reads conflicting with

refreshes

[Stuecheli’10]

slide-23
SLIDE 23

Elastic Refresh

¨ Introduce refresh backlog dependent idle threshold ¨ With a log backlog, there is no reason to send refresh

command

¨ With a bursty request stream, the probability of a

future request decreases with time

¨ As backlog grows, decrease this delay threshold

Refresh Backlog 1 2 3 4 5 6 7 8 Proportional Constant High Priority

Idle Delay Threshold

[Stuecheli’10] Key: to reduce REF and READ conflicts

slide-24
SLIDE 24

DRAM Refresh vs. ERROR Rate

error rate power

refresh cycle [s]

64 mSec Where we are today Where we want to be X sec The

  • pportunity

The cost If software is able to tolerate errors, we can lower DRAM refresh rates to achieve considerable power savings

slide-25
SLIDE 25

Flikker

¨ Divide memory bank into high refresh part and low refresh

parts

¨ Size of high-refresh portion can be configured at runtime ¨ Small modification of the Partial Array Self-Refresh (PASR)

mode

High Refresh Low Refresh ¾ ½ ¼ ⅛

Flikker DRAM Bank

1

[Song’14]

slide-26
SLIDE 26

Refresh Pausing

A time Refresh B Request B arrives Interrupted

Baseline system

Refresh (Cont.)

Refresh Pausing

B A Refresh time Request B arrives

Pausing Refresh reduces wait time for Reads Pausing at arbitrary point can cause data loss

slide-27
SLIDE 27

Performance Results

1.02 1.04 1.06 1.08 1.10 1.12

COMMERCIAL SPEC PARSEC BIOBENCH GMEAN

Speedup

Performance Comparison

Refresh Pausing No Refresh