DRAM REFRESH MANAGEMENT Mahdi Nazm Bojnordi Assistant Professor - - PowerPoint PPT Presentation
DRAM REFRESH MANAGEMENT Mahdi Nazm Bojnordi Assistant Professor - - PowerPoint PPT Presentation
DRAM REFRESH MANAGEMENT Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 7810: Advanced Computer Architecture Overview Upcoming deadline Tonight: homework assignment will be posted This lecture
Overview
¨ Upcoming deadline
¤ Tonight: homework assignment will be posted
¨ This lecture
¤ DRAM address mapping ¤ DRAM refresh basics ¤ Smart refresh ¤ Elastic refresh ¤ Avoiding or pausing refreshes
DRAM Address Mapping
¨ Where to store cache lines in main memory?
Row Bank Column
Typical Mapping
Block
Application A:
Bank
DRAM Banks
Good distribution of memory requests among DRAM banks.
DRAM Address Mapping
¨ Where to store cache lines in main memory?
Row Bank Column
Typical Mapping
Block Bank
DRAM Banks
Application B: Unbalanced distribution of memory requests among DRAM banks.
DRAM Address Mapping
¨ How to compute bank ID?
Row Row Column
Custom Mapping
Block
Application B:
Bank
DRAM Banks
Good distribution of memory requests among DRAM banks.
Bank 0 Bank 1 Bank 2 Bank 3
Address format
page index page offset page offset bank r p-b b k
cacheline 0 cacheline 4 … cacheline 1 cacheline 5 … cacheline 2 cacheline 6 … cacheline 3 cacheline 7 … Spatial locality is not well preserved!
Cache Line Interleaving
Page Interleaving
Page 0 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 … … … … Bank 0
Address format
Bank 1 Bank 2 Bank 3
page index page offset bank r p k
Cache Line Mapping
¨ Bank index is a subset of set index
page index page offset page offset cache tag cache set index block offset bank page index page offset t s b bank r r p-b b k p k
Cache-related representation Cache line interleaving Page interleaving
Row Buffer Conflict
¨ Problem: interleaving load and writeback streams
with the same access pattern to the banks may result in row buffer misses
x Load y Writeback x+b y+b x+2b y+2b x+3b … x x+b x+2b x+3b the same row buffer
Key Issues
¨ To exploit spatial locality, use maximal interleaving granularity
(or row-buffer size)
¨ To reduce row buffer conflicts, use only those bits in cache set
index for “bank bits”
page index page offset bank r p k cache tag cache set index block offset t s b
Permutation-based Interleaving
k
XOR
k
page index page offset new bank
k
page offset index bank L2 Cache tag [Zhang‘00]
Permutation-based Interleaving
¨ New bank index
[Zhang‘00]
memory banks
0000 0001 0010 0011 0100 0101 0110 0111 1010 1011
Permutation-base interleaving
1011 1010 1010 1001 1000 1010 1010 1010
L2 Conflicting addresses
xor
Different bank indexes Conventional interleaving Same bank indexes
Permutation-based Interleaving
[Zhang‘00]
60% 80% 100% 120% 140% 160% 180% t
- m
c a t v s w i m s u 2 c
- r
h y d r
- 2
d m g r i d a p p l u t u r b 3 d w a v e 5 T P C
- C
IPC cacheline page swap permutation
DRAM Refresh
¨ DRAM cells lose charge over time ¨ Periodic refresh operations are required to avoid
data loss
¨ Two main strategies for refreshing DRAM cells
¤ Burst refresh: refresh all of the cells each time
n Simple control mechanism (e.g., LPDDRx)
¤ Distributed refresh: a group of cells are refreshed
n Avoid blocking memory for a long time
n time bursts m time distributed
Refresh Basics
¨ tRET: the retention time of DRAM leaky cells (64ms) ¤ All cells must be refreshed within tRET to avoid data loss ¨ tREFI: refresh interval, which is the gap between two refresh
commands issues by the memory controller
¤ MC sends 8192 auto-refresh commands to refresh one bin at a
time
n tREFI = tRET/8192 = 7.8us ¨ tRFC: the time to finish refreshing a bin (refresh completion) ¨ What is the bin size?
Refresh Basics
¨ tRFC increases with chip capacity
100 200 300 400 500 600 700 1 2 4 8 16 32 tRFC (ns) Chip Size (Gb)
Impact of chip density on refresh completion time
[Stuecheli’10]
Controlling Refresh Operations
¨ CAS before RAS (CBR) ¤ DRAM memory keeps track of the addresses using an
internal counter
¨ RAS only refresh (ROR) ¤ Row address is specified by the controller; similar to a pair
- f activate and precharge
¨ Auto-refresh vs. self refresh ¤ Every 7.8us a REF command is sent to DRAM (tRAS+tRP) ¤ LPDDR turns off IO for saving power while refreshing
multiple rows
Refresh Granularity
¨ All bank vs. per bank refresh
Optimizing DRAM Refresh
¨ Observation: each row may be accessed as soon as
it is to be refreshed
Time Refresh Time for Row 0 Refresh Time for Row 1 Refresh Time for Row 2 Refresh Time for Row 3
Mem access Mem access Mem access Mem access Mem Refresh Mem Refresh Mem Refresh Mem Refresh
Smart Refresh
¨ Idea: avoid refreshing recently accessed rows
[Ghosh‘07]
21 Laboratory for Computer Architecture 12/7/2010
Diverse Impacts of Refresh
Refresh 26ns 326ns Worst Case Refresh Hit DRAM Read DRAM capacity tRFC bandwidth
- verhead
(95oC per Rank) latency
- verhead
(95oC)
512Mb 90ns 2.7% 1.4ns 1Gb 110ns 3.3% 2.1ns 2Gb 160ns 5.0% 4.9ns 4Gb 300ns 7.7% 11.5ns 8Gb 350ns 9.0% 15.7ns
Refreshes Reads tRFC tREFI
[Stuecheli’10]
Elastic Refresh
¨ Send refreshes during periods of inactivity ¨ Non-uniform request distribution ¨ Refresh overhead just has to fit in free cycles ¨ Initially not aggressive, converges with delay until
empty (DUE) as refresh backlog grows
¨ Latency sensitive workloads are often lower
bandwidth
¨ Decrease the probability of reads conflicting with
refreshes
[Stuecheli’10]
Elastic Refresh
¨ Introduce refresh backlog dependent idle threshold ¨ With a log backlog, there is no reason to send refresh
command
¨ With a bursty request stream, the probability of a
future request decreases with time
¨ As backlog grows, decrease this delay threshold
Refresh Backlog 1 2 3 4 5 6 7 8 Proportional Constant High Priority
Idle Delay Threshold
[Stuecheli’10] Key: to reduce REF and READ conflicts
DRAM Refresh vs. ERROR Rate
error rate power
refresh cycle [s]
64 mSec Where we are today Where we want to be X sec The
- pportunity
The cost If software is able to tolerate errors, we can lower DRAM refresh rates to achieve considerable power savings
Flikker
¨ Divide memory bank into high refresh part and low refresh
parts
¨ Size of high-refresh portion can be configured at runtime ¨ Small modification of the Partial Array Self-Refresh (PASR)
mode
High Refresh Low Refresh ¾ ½ ¼ ⅛
Flikker DRAM Bank
1
[Song’14]
Refresh Pausing
A time Refresh B Request B arrives Interrupted
Baseline system
Refresh (Cont.)
Refresh Pausing
B A Refresh time Request B arrives
Pausing Refresh reduces wait time for Reads Pausing at arbitrary point can cause data loss
Performance Results
1.02 1.04 1.06 1.08 1.10 1.12
COMMERCIAL SPEC PARSEC BIOBENCH GMEAN
Speedup
Performance Comparison