DRAM REFRESH MANAGEMENT Mahdi Nazm Bojnordi Assistant Professor - PowerPoint PPT Presentation

DRAM REFRESH MANAGEMENT Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 7810: Advanced Computer Architecture

Overview ¨ Upcoming deadline ¤ Tonight: homework assignment will be posted ¨ This lecture ¤ DRAM address mapping ¤ DRAM refresh basics ¤ Smart refresh ¤ Elastic refresh ¤ Avoiding or pausing refreshes

DRAM Address Mapping ¨ Where to store cache lines in main memory? Typical Mapping Row Bank Bank Column Block DRAM Banks Application A: Good distribution of memory requests among DRAM banks.

DRAM Address Mapping ¨ Where to store cache lines in main memory? Typical Mapping Row Bank Bank Column Block DRAM Banks Application B: Unbalanced distribution of memory requests among DRAM banks.

DRAM Address Mapping ¨ How to compute bank ID? Custom Mapping Row Bank Row Column Block DRAM Banks Application B: Good distribution of memory requests among DRAM banks.

Cache Line Interleaving cacheline 0 cacheline 1 cacheline 2 cacheline 3 cacheline 4 cacheline 5 cacheline 6 cacheline 7 … … … … Bank 0 Bank 1 Bank 2 Bank 3 Address format r p-b k b page index page offset bank page offset Spatial locality is not well preserved!

Page Interleaving Page 0 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 … … … … Bank 0 Bank 1 Bank 2 Bank 3 Address format r k p page index bank page offset

Cache Line Mapping ¨ Bank index is a subset of set index r p-b k b Cache line page index page offset bank page offset interleaving r k p Page page index bank page offset interleaving t s b Cache-related cache tag cache set index block offset representation

Row Buffer Conflict ¨ Problem: interleaving load and writeback streams with the same access pattern to the banks may result in row buffer misses Writeback Load x x y x+b x+b y+b x+2b x+2b y+2b x+3b x+3b … the same row buffer

Key Issues ¨ To exploit spatial locality, use maximal interleaving granularity (or row-buffer size) ¨ To reduce row buffer conflicts, use only those bits in cache set index for “bank bits” r p k page index bank page offset t s b cache tag cache set index block offset

Permutation-based Interleaving L2 Cache tag index bank page offset k k XOR k page index new bank page offset [Zhang‘00]

Permutation-based Interleaving ¨ New bank index Permutation-base Conventional interleaving interleaving memory banks L2 Conflicting addresses 0000 1000 1010 0001 0010 0011 1001 1010 0100 0101 1010 1010 0110 0111 1010 1011 1010 1011 xor Different bank indexes Same bank indexes [Zhang‘00]

Permutation-based Interleaving 180% cacheline 160% page swap 140% permutation IPC 120% 100% 80% 60% v d r 5 m d C d u o t 2 e 3 i l - a i c r o p v C b w c g 2 p a r r P m s m d u a u w T y s o t h t [Zhang‘00]

DRAM Refresh ¨ DRAM cells lose charge over time ¨ Periodic refresh operations are required to avoid data loss ¨ Two main strategies for refreshing DRAM cells ¤ Burst refresh: refresh all of the cells each time n Simple control mechanism (e.g., LPDDRx) ¤ Distributed refresh: a group of cells are refreshed n Avoid blocking memory for a long time bursts distributed m n time time

Refresh Basics ¨ tRET: the retention time of DRAM leaky cells (64ms) ¤ All cells must be refreshed within tRET to avoid data loss ¨ tREFI: refresh interval, which is the gap between two refresh commands issues by the memory controller ¤ MC sends 8192 auto-refresh commands to refresh one bin at a time n tREFI = tRET/8192 = 7.8us ¨ tRFC: the time to finish refreshing a bin (refresh completion) ¨ What is the bin size?

Refresh Basics ¨ tRFC increases with chip capacity Impact of chip density on refresh completion time 700 600 500 tRFC (ns) 400 300 200 100 0 1 2 4 8 16 32 Chip Size (Gb) [Stuecheli’10]

Controlling Refresh Operations ¨ CAS before RAS (CBR) ¤ DRAM memory keeps track of the addresses using an internal counter ¨ RAS only refresh (ROR) ¤ Row address is specified by the controller; similar to a pair of activate and precharge ¨ Auto-refresh vs. self refresh ¤ Every 7.8us a REF command is sent to DRAM (tRAS+tRP) ¤ LPDDR turns off IO for saving power while refreshing multiple rows

Refresh Granularity ¨ All bank vs. per bank refresh

Optimizing DRAM Refresh ¨ Observation: each row may be accessed as soon as it is to be refreshed Mem Refresh Mem access Mem access Mem Refresh Mem access Mem access Mem Refresh Mem Refresh Time Refresh Time Refresh Time Refresh Time Refresh Time for Row 0 for Row 1 for Row 2 for Row 3

Smart Refresh ¨ Idea: avoid refreshing recently accessed rows [Ghosh‘07]

Diverse Impacts of Refresh 26ns 326ns bandwidth latency Refresh DRAM tRFC overhead overhead capacity (95 o C per Rank) (95 o C) 512Mb 90ns 2.7% 1.4ns Worst Case Refresh Hit DRAM Read 1Gb 110ns 3.3% 2.1ns 2Gb 160ns 5.0% 4.9ns 4Gb 300ns 7.7% 11.5ns Refreshes Reads 8Gb 350ns 9.0% 15.7ns tREFI tRFC [Stuecheli’10] 21 Laboratory for Computer Architecture 12/7/2010

Elastic Refresh ¨ Send refreshes during periods of inactivity ¨ Non-uniform request distribution ¨ Refresh overhead just has to fit in free cycles ¨ Initially not aggressive, converges with delay until empty (DUE) as refresh backlog grows ¨ Latency sensitive workloads are often lower bandwidth ¨ Decrease the probability of reads conflicting with refreshes [Stuecheli’10]

Elastic Refresh ¨ Introduce refresh backlog dependent idle threshold ¨ With a log backlog, there is no reason to send refresh command ¨ With a bursty request stream, the probability of a future request decreases with time ¨ As backlog grows, decrease this delay threshold High Idle Constant Proportional Priority Delay Threshold 1 2 3 4 5 6 7 8 Refresh Backlog Key: to reduce REF and READ conflicts [Stuecheli’10]

DRAM Refresh vs. ERROR Rate power error rate The opportunity The cost 64 mSec X sec refresh cycle [s] Where we Where we are today want to be If software is able to tolerate errors, we can lower DRAM refresh rates to achieve considerable power savings

Flikker ¨ Divide memory bank into high refresh part and low refresh parts ¨ Size of high-refresh portion can be configured at runtime ¨ Small modification of the Partial Array Self-Refresh (PASR) mode Flikker DRAM Bank High ⅛ Refresh ¼ ½ Low Refresh ¾ 1 [Song’14]

Refresh Pausing A Refresh B Baseline system time Request B arrives A Refresh B Refresh (Cont.) Refresh time Interrupted Pausing Request B arrives Pausing at arbitrary point can cause data loss Pausing Refresh reduces wait time for Reads

Performance Results Performance Comparison Refresh Pausing No Refresh 1.12 1.10 Speedup 1.08 1.06 1.04 1.02 COMMERCIAL SPEC PARSEC BIOBENCH GMEAN

DRAM REFRESH MANAGEMENT Mahdi Nazm Bojnordi Assistant Professor - PowerPoint PPT Presentation

DRAM REFRESH MANAGEMENT Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 7810: Advanced Computer Architecture Overview Upcoming deadline Tonight: homework assignment will be posted This lecture

Large Scale DRAM Model DRAM Engineers DRAM Engineers Team: Abdulrahman Alqahtani,

Virtual Memory Lecture 25 CS301 DRAM as cache What about programs larger than DRAM?

COMP 590-154: Computer Architecture Memory / DRAM SRAM vs. DRAM SRAM = Static RAM As

Memory Refresh Kate Nguyen, Kehan Lyu, Xianze Meng , Vilas Sridharan, Xun Jian History of DRAM 2

2018 2019 Demand Response Auction Mechanism ( DRAM DRAM 3) 3) Pre Bi Pre Bid

Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit

Main Memory and DRAM Instructor: Nima Honarmand Spring 2015 :: CSE 502 Computer Architecture

Main Memory and DRAM Nima Honarmand Spring 2016 :: CSE 502 Computer Architecture SRAM vs.

DRAM CONTROLLER Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah

Kilo Instruction Processors Adrin Cristal 2/7/2019 YALE 80 Processor-DRAM Gap (latency)

Viyojit: Decoupling Battery and DRAM Capacities for Battery-Backed DRAM Rajat Kateja # Anirudh

DRAM 1 Dynamic Random Access Memory (DRAM) Storage Charge on a capacitor Decays

DRAM Dynamic Random Access Memory (DRAM) Storage Charge on a capacitor Decays

DRAM CONTROLLER Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah

The Colored Refresh Server for DRAM Xing Pan, Frank Mueller North Carolina State University

Flicker: Refresh Power Reduction in DRAM Memories through Critical Data Partitioning Song Liu,

Principle with UCNs A.F. Frank frank@nf.jinr.ru International Workshop " Probing Fundamental

Surface plasmon resonance method for precise detection of low concentration solutions Friday, 17

ESS 439 lab 2 Isotropic materials, Anisotropic minerals Isotropic medium: velocity of light is

Highly Transparent and Highly Passivating Silicon Nitride for Solar Cells Yimao Wan The Australian

2 1 3 Instruction ALU Registers Small: Register file (group of numbered

Refreshing MLab www.measurementlab.net Matt Mathis <mattmathis@google.com> maprg at

Tight Private Circuits: Achieving Probing Security with the Least Refreshing Sonia Belaid,

with Katy Hostman, Steve Allen and Rachel Bailey TABLE OF CONTENTS Organization Inspiration

DRAM REFRESH MANAGEMENT Mahdi Nazm Bojnordi Assistant Professor - PowerPoint PPT Presentation

DRAM REFRESH MANAGEMENT Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 7810: Advanced Computer Architecture Overview Upcoming deadline Tonight: homework assignment will be posted This lecture

Large Scale DRAM Model DRAM Engineers DRAM Engineers Team: Abdulrahman Alqahtani,

Virtual Memory Lecture 25 CS301 DRAM as cache What about programs larger than DRAM?

COMP 590-154: Computer Architecture Memory / DRAM SRAM vs. DRAM SRAM = Static RAM As

Memory Refresh Kate Nguyen, Kehan Lyu, Xianze Meng , Vilas Sridharan, Xun Jian History of DRAM 2

2018 2019 Demand Response Auction Mechanism ( DRAM DRAM 3) 3) Pre Bi Pre Bid

Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit

Main Memory and DRAM Instructor: Nima Honarmand Spring 2015 :: CSE 502 Computer Architecture

Main Memory and DRAM Nima Honarmand Spring 2016 :: CSE 502 Computer Architecture SRAM vs.

DRAM CONTROLLER Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah

Kilo Instruction Processors Adrin Cristal 2/7/2019 YALE 80 Processor-DRAM Gap (latency)

Viyojit: Decoupling Battery and DRAM Capacities for Battery-Backed DRAM Rajat Kateja # Anirudh

DRAM 1 Dynamic Random Access Memory (DRAM) Storage Charge on a capacitor Decays

DRAM Dynamic Random Access Memory (DRAM) Storage Charge on a capacitor Decays

DRAM CONTROLLER Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah

The Colored Refresh Server for DRAM Xing Pan, Frank Mueller North Carolina State University

Flicker: Refresh Power Reduction in DRAM Memories through Critical Data Partitioning Song Liu,

Principle with UCNs A.F. Frank frank@nf.jinr.ru International Workshop &quot; Probing Fundamental

Surface plasmon resonance method for precise detection of low concentration solutions Friday, 17

ESS 439 lab 2 Isotropic materials, Anisotropic minerals Isotropic medium: velocity of light is

Highly Transparent and Highly Passivating Silicon Nitride for Solar Cells Yimao Wan The Australian

2 1 3 Instruction ALU Registers Small: Register file (group of numbered

Refreshing MLab www.measurementlab.net Matt Mathis &lt;mattmathis@google.com&gt; maprg at

Tight Private Circuits: Achieving Probing Security with the Least Refreshing Sonia Belaid,

with Katy Hostman, Steve Allen and Rachel Bailey TABLE OF CONTENTS Organization Inspiration

Principle with UCNs A.F. Frank frank@nf.jinr.ru International Workshop " Probing Fundamental

Refreshing MLab www.measurementlab.net Matt Mathis <mattmathis@google.com> maprg at