Impr mproving DR ving DRAM P M Per erfor ormanc mance e by P - PowerPoint PPT Presentation

Impr mproving DR ving DRAM P M Per erfor ormanc mance e by P y Par arallelizing R allelizing Refr efreshes eshes with A with Accesses esses Kevin Chang Donghyuk Lee, Zeshan Chishti, Alaa Alameldeen, Chris Wilkerson, Yoongu Kim, Onur Mutlu

Ex Executiv ecutive Summar e Summary y • DRAM refr efresh in esh inter erfer eres with memor es with memory ac y accesses esses – Degrades system performance and energy efficiency – Becomes exacerbated as DRAM density increases • Goal: Serve memory accesses in parallel with refreshes to reduce refresh interference on demand requests • Our mechanisms: – 1. Enable more parallelization between refreshes and accesses across different banks with new per-bank refresh scheduling algorithms – 2. Enable serving accesses concurrently with refreshes in the same bank by exploiting DRAM subarrays • Improve system performance and energy efficiency for a wide variety of different workloads and DRAM densities – 20.2% and 9.0% for 8-core systems using 32Gb DRAM – Very close to the ideal scheme without refreshes 2

Outline Outline • Motiv otivation and Key Ideas tion and Key Ideas • DRAM and Refresh Background • Our Mechanisms • Results 3

Refr efresh P esh Penalt enalty y Refresh interferes with memory accesses oller troller Access ess emory y Memor tr transist ansistor or Refr efresh esh DR DRAM M Proc ocessor essor ontr Read ead Con Data Da ta Capacit apacitor or Refr efresh dela esh delays r s requests b equests by 100s of ns y 100s of ns 4

Existing R Existing Refr efresh M esh Modes odes All-bank r All-bank refr efresh esh in commodity DRAM (DDRx) Bank 7 Bank 7 Time … Refr efresh esh Bank 1 Bank 1 Bank 0 Bank 0 Per-bank refresh allows accesses to other Per er-bank r -bank refr efresh esh in mobile DRAM (LPDDRx) banks while a bank is refreshing Round-r ound-robin or obin order der Bank 7 Bank 7 Time … Bank 1 Bank 1 Bank 0 Bank 0 5

Shor Shortcomings of P omings of Per er-Bank R -Bank Refr efresh esh • Problem 1: Refreshes to different banks are scheduled in a strict round-robin order – The static ordering is hardwired into DRAM chips – Refreshes busy banks with many queued requests when other banks are idle • Key idea: Schedule per-bank refreshes to idle banks opportunistically in a dynamic order 6

Shor Shortcomings of P omings of Per er-Bank R -Bank Refr efresh esh • Problem 2: Banks that are being refreshed cannot concurrently serve memory requests Dela elayed b ed by r y refr efresh esh Bank 0 Bank 0 Per er-Bank R -Bank Refr efresh esh RD RD Time 7

Shor Shortcomings of P omings of Per er-Bank R -Bank Refr efresh esh • Problem 2: Refreshing banks cannot concurrently serve memory requests • Key idea: Exploit subar subarrays within a bank to parallelize refreshes and accesses across subar subarrays s Subarray 1 Subar y 1 RD RD Time Bank 0 Bank 0 Subar Subarray 0 y 0 Time Subar Subarray R y Refr efresh esh Par aralleliz allelize e 8

Outline Outline • Motivation and Key Ideas • DR DRAM and R M and Refr efresh Backg esh Background ound • Our Mechanisms • Results 9

DR DRAM S M Syst stem Or em Organiza ganization tion Rank 1 ank 1 Rank 0 ank 0 Bank 7 Bank 7 Rank 1 ank 1 … DRAM DR M Bank 1 Bank 1 Bank 0 Bank 0 • Banks can serve multiple requests in parallel 10

DR DRAM R M Refr efresh F esh Frequenc equency y • DRAM standard requires memory controllers to send per periodic r iodic refr efreshes eshes to DRAM tRefLatency (tRFC): Varies based on DRAM chip density (e.g., 350ns) Read/Write: roughly 50ns Timeline tRefPeriod (tREFI): Remains constan onstant 11

Incr ncreasing P easing Per erfor ormanc mance I e Impac mpact t • DRAM is unavailable to serve requests for tRefLatency of time tRefPeriod • 6.7% 6.7% for today’s 4Gb DRAM • Unavailability increases with higher density due to higher tRefLatency – 23% / 41% 23% / 41% for futur future 32Gb / 64Gb DR e 32Gb / 64Gb DRAM 12

All-Bank vs. P All-Bank v . Per er-Bank R -Bank Refr efresh esh All-Bank Refresh: Employed in commodity DRAM (DDRx, LPDDRx) Read ead Bank 1 Refr efresh esh Timeline Refr efresh esh Staggered across Read ead Bank 0 Refr efresh esh banks to limit power Per-Bank Refresh: In mobile DRAM (LPDDRx) Bank 1 Timeline Read ead Refr efresh esh Bank 0 Refr efresh esh Read ead • Shorter tR tRefLa efLatenc ency than that of all-bank refresh Can serve memory accesses in parallel with • More frequent refreshes (shorter tR tRefP efPer eriod iod) refreshes across banks 13

Shor Shortcomings of P omings of Per er-Bank R -Bank Refr efresh esh • 1) Per-bank refreshes are str stric ictly scheduled tly scheduled in round-robin order (as fixed by DRAM’s internal logic) • 2) A refr efreshing bank eshing bank cannot serve memory accesses Goal: Enable more parallelization between refreshes and accesses using practical mechanisms 14

Outline Outline • Motivation and Key Ideas • DRAM and Refresh Background • Our Mechanisms – 1. Dynamic Access-Refresh Parallelization (DARP) – 2. Subarray Access-Refresh Parallelization (SARP) • Results 15

Our F Our First Appr irst Approach: D oach: DARP ARP • Dynamic A ynamic Access-R ess-Refr efresh P esh Par aralleliza allelization (D tion (DARP) ARP) – An improved scheduling policy for per-bank refreshes – Exploits refresh scheduling flexibility in DDR DRAM • Component 1: Out Out-of- of-or order per der per-bank r -bank refr efresh esh – Avoids poor static scheduling decisions – Dynamically issues per-bank refreshes to idle banks • Component 2: Writ ite-R -Refr efresh P esh Par aralleliza allelization tion – Avoids refresh interference on latency-critical reads – Parallelizes refreshes with a ba a batch of wr ch of writ ites es 16

1) Out 1) Out-of- of-Or Order P der Per er-Bank R -Bank Refr efresh esh • Dynamic scheduling polic ynamic scheduling policy that prioritizes refreshes to idle banks • Memor emory c y con ontr trollers ollers decide which bank to refresh 17

1) Out-of- 1) Out of-Or Order P der Per er-Bank R -Bank Refr efresh esh Baseline: Round r Baseline: R ound robin obin Request queue (Bank 0) Request queue (Bank 1) Read Read Bank 1 Refr efresh esh Read ead Timeline Bank 0 Refr efresh esh Read ead Dela elayed b ed by r y refr efresh esh Reduces refresh penalty on demand requests Our mechanism: DARP Our mechanism: D ARP by refreshing idle banks first in a flexible order Saved c Sa ed cycles cles Bank 1 Refr efresh esh Read ead Bank 0 Read ead Refr efresh esh Sa Saved c ed cycles cles 18

Outline Outline • Motivation and Key Ideas • DRAM and Refresh Background • Our Mechanisms – 1. Dynamic Access-Refresh Parallelization (DARP) • 1) Out-of-Order Per-Bank Refresh • 2) 2) Writ ite-R -Refr efresh P esh Par aralleliza allelization tion – 2. Subarray Access-Refresh Parallelization (SARP) • Results 19

Refr efresh I esh Inter erfer erenc ence on Upc e on Upcoming R oming Requests equests • Problem: A refresh may collide with an upcoming request in the near future Bank 1 Read ead Time Bank 0 Refr efresh esh Dela elayed b ed by r y refr efresh esh Read ead 20

DR DRAM M Writ ite Dr e Draining aining • Observations: • 1) Bus-tur Bus-turnar naround la ound latenc ency y when transitioning from writes to reads or vice versa – To mitigate bus-tur bus-turnar naround la ound latenc ency, writes are typically drained to DRAM in a batch during a period of time • 2) Writes are not la latenc ency-cr critical itical Turnaround Bank 1 Timeline Read ead Writ ite e Writ ite e Writ ite e Bank 0 21

2) 2) Writ ite-R -Refr efresh P esh Par aralleliza allelization tion • Proactively schedules refreshes when banks are serving wr writ ite ba e batches ches Baseline Baseline Turnaround Bank 1 Timeline Read ead Writ ite e Writ ite e Writ ite e Bank 0 Refr efresh esh Read ead Dela elayed b ed by r y refr efresh esh Avoids stalling latency-critical read requests Writ ite-r -refr efresh par esh paralleliza allelization tion by refreshing with non-latency-critical writes Turnaround Bank 1 Timeline Read ead Writ ite e Writ ite e Writ ite e Bank 0 Refr efresh esh Read ead Refr efresh esh 1. Postpone r 1. P ostpone refr efresh esh 2. R 2. Refr efresh dur esh during wr ing writ ites es Sa Saved c ed cycles cles 22

Outline Outline • Motivation and Key Ideas • DRAM and Refresh Background • Our Mechanisms – 1. Dynamic Access-Refresh Parallelization (DARP) – 2. Subarray Access-Refresh Parallelization (SARP) • Results 23

Our S Our Sec econd Appr ond Approach: SARP oach: SARP Observations: 1. A bank is further divided into subar subarrays s – Each has its own row buffer to perform refresh operations Bank 7 Bank 7 … Subar Subarray y Bank 1 Bank 1 Row Bu w Bufffer er Bank I/O Bank I/O Bank 0 Bank 0 Idle Idle 2. Some subar subarrays and bank I/O bank I/O remain completely idle idle during refresh 24

Impr mproving DR ving DRAM P M Per erfor ormanc mance e by P - PowerPoint PPT Presentation

Impr mproving DR ving DRAM P M Per erfor ormanc mance e by P y Par arallelizing R allelizing Refr efreshes eshes with A with Accesses esses Kevin Chang Donghyuk Lee, Zeshan Chishti, Alaa Alameldeen, Chris Wilkerson, Yoongu Kim,

ICI CICI CI Gr Group: oup: Per erfor ormanc mance e & St Stra rateg egy June ne

ICI CICI CI Gr Group: oup: Per erfor orman mance ce & St Stra rateg egy May 201

High High Per erfor ormance mance Dummy ummy Fill Ins Fill nser ertion ion wit ith h

High High Per erfor ormance mance Dummy ummy Fill Ins Fill nser ertion ion wit ith h

D-F -FACTOR OR: : A Quant Quantit itativ ive e Per erfor ormance mance Model odel of

Brok oken en Linux Linux Per erfor ormance mance Tools ools Brendan Gregg Senior

Collect ollectiv ive e Fr Framew amewor ork k and and Per erfor ormance mance Optimiz

Large Scale DRAM Model DRAM Engineers DRAM Engineers Team: Abdulrahman Alqahtani,

Impr oving Memor y Hier ar chy Per for mance For Ir r egular Applications J ohn Mellor- Crummey

JDBC JDBC Perf erfor ormance mance fr from the Inside om the Inside Ju July 2017 1

Perf erfor ormance mance Anal Analysis ysis Gilingans Und Under erpa pass ss Develop

Substa Sub station Asset P tion Asset Perf erfor ormance mance Using MinMax Using MinMax

Boostin Boosting g Perf erfor ormance mance and Ear and Earnings nings of Cloud Computing

I MPROVING C ONTRACEPTIVE M ETHOD M IX (ICMM) I MPROVING C ONTRACEPTIVE M ETHOD M IX (ICMM) Update

COMP 590-154: Computer Architecture Memory / DRAM SRAM vs. DRAM SRAM = Static RAM As

IMPR MPROVING N NEW EW MEXICO COS S CHA CHARTER AND AUTHO HORIZ RIZER A R ACCO

Department of Micro- and Nanosciences (MNT) Aalto University Research groups (physics related)

SENIOR SCIENTIST SHRIRAM INSTITUTE FOR INDUSTRIAL RESEARCH 19, UNIVERSITY ROAD,DELHI - 110007

In Ho Hot Water Experiences of Solar Hot Water Arizona Arizona Solar Center, Inc. and Salt River

SESSION TITLE Moderator: Speakers: Kevin Gurchak, Director- Patrick Magnotta, Assistant

Refreshing GNSO.ICANN.ORG Scott Pinzon, ICANN Director of Marketing Chris Chaplow, Vice-Chair,

Transformational Economic Change Complexity, economic evolution, and institutions Marcus Jenal,

Refreshing our Home Jacob Estrin Graeme Whitaker Elizabeth Sackson Aleisha Dunn QUT Consulting

behaviour to be proud of behaviour to be proud of We think that how we do things is just as

Impr mproving DR ving DRAM P M Per erfor ormanc mance e by P - PowerPoint PPT Presentation

Impr mproving DR ving DRAM P M Per erfor ormanc mance e by P y Par arallelizing R allelizing Refr efreshes eshes with A with Accesses esses Kevin Chang Donghyuk Lee, Zeshan Chishti, Alaa Alameldeen, Chris Wilkerson, Yoongu Kim,

ICI CICI CI Gr Group: oup: Per erfor ormanc mance e &amp; St Stra rateg egy June ne

ICI CICI CI Gr Group: oup: Per erfor orman mance ce &amp; St Stra rateg egy May 201

High High Per erfor ormance mance Dummy ummy Fill Ins Fill nser ertion ion wit ith h

High High Per erfor ormance mance Dummy ummy Fill Ins Fill nser ertion ion wit ith h

D-F -FACTOR OR: : A Quant Quantit itativ ive e Per erfor ormance mance Model odel of

Brok oken en Linux Linux Per erfor ormance mance Tools ools Brendan Gregg Senior

Collect ollectiv ive e Fr Framew amewor ork k and and Per erfor ormance mance Optimiz

Large Scale DRAM Model DRAM Engineers DRAM Engineers Team: Abdulrahman Alqahtani,

Impr oving Memor y Hier ar chy Per for mance For Ir r egular Applications J ohn Mellor- Crummey

JDBC JDBC Perf erfor ormance mance fr from the Inside om the Inside Ju July 2017 1

Perf erfor ormance mance Anal Analysis ysis Gilingans Und Under erpa pass ss Develop

Substa Sub station Asset P tion Asset Perf erfor ormance mance Using MinMax Using MinMax

Boostin Boosting g Perf erfor ormance mance and Ear and Earnings nings of Cloud Computing

I MPROVING C ONTRACEPTIVE M ETHOD M IX (ICMM) I MPROVING C ONTRACEPTIVE M ETHOD M IX (ICMM) Update

COMP 590-154: Computer Architecture Memory / DRAM SRAM vs. DRAM SRAM = Static RAM As

IMPR MPROVING N NEW EW MEXICO COS S CHA CHARTER AND AUTHO HORIZ RIZER A R ACCO

Department of Micro- and Nanosciences (MNT) Aalto University Research groups (physics related)

SENIOR SCIENTIST SHRIRAM INSTITUTE FOR INDUSTRIAL RESEARCH 19, UNIVERSITY ROAD,DELHI - 110007

In Ho Hot Water Experiences of Solar Hot Water Arizona Arizona Solar Center, Inc. and Salt River

SESSION TITLE Moderator: Speakers: Kevin Gurchak, Director- Patrick Magnotta, Assistant

Refreshing GNSO.ICANN.ORG Scott Pinzon, ICANN Director of Marketing Chris Chaplow, Vice-Chair,

Transformational Economic Change Complexity, economic evolution, and institutions Marcus Jenal,

Refreshing our Home Jacob Estrin Graeme Whitaker Elizabeth Sackson Aleisha Dunn QUT Consulting

behaviour to be proud of behaviour to be proud of We think that how we do things is just as

ICI CICI CI Gr Group: oup: Per erfor ormanc mance e & St Stra rateg egy June ne

ICI CICI CI Gr Group: oup: Per erfor orman mance ce & St Stra rateg egy May 201