impr mproving dr ving dram p m per erfor ormanc mance e
play

Impr mproving DR ving DRAM P M Per erfor ormanc mance e by P - PowerPoint PPT Presentation

Impr mproving DR ving DRAM P M Per erfor ormanc mance e by P y Par arallelizing R allelizing Refr efreshes eshes with A with Accesses esses Kevin Chang Donghyuk Lee, Zeshan Chishti, Alaa Alameldeen, Chris Wilkerson, Yoongu Kim,


  1. Impr mproving DR ving DRAM P M Per erfor ormanc mance e by P y Par arallelizing R allelizing Refr efreshes eshes with A with Accesses esses Kevin Chang Donghyuk Lee, Zeshan Chishti, Alaa Alameldeen, Chris Wilkerson, Yoongu Kim, Onur Mutlu

  2. Ex Executiv ecutive Summar e Summary y • DRAM refr efresh in esh inter erfer eres with memor es with memory ac y accesses esses – Degrades system performance and energy efficiency – Becomes exacerbated as DRAM density increases • Goal: Serve memory accesses in parallel with refreshes to reduce refresh interference on demand requests • Our mechanisms: – 1. Enable more parallelization between refreshes and accesses across different banks with new per-bank refresh scheduling algorithms – 2. Enable serving accesses concurrently with refreshes in the same bank by exploiting DRAM subarrays • Improve system performance and energy efficiency for a wide variety of different workloads and DRAM densities – 20.2% and 9.0% for 8-core systems using 32Gb DRAM – Very close to the ideal scheme without refreshes 2

  3. Outline Outline • Motiv otivation and Key Ideas tion and Key Ideas • DRAM and Refresh Background • Our Mechanisms • Results 3

  4. Refr efresh P esh Penalt enalty y Refresh interferes with memory accesses oller troller Access ess emory y Memor tr transist ansistor or Refr efresh esh DR DRAM M Proc ocessor essor ontr Read ead Con Data Da ta Capacit apacitor or Refr efresh dela esh delays r s requests b equests by 100s of ns y 100s of ns 4

  5. Existing R Existing Refr efresh M esh Modes odes All-bank r All-bank refr efresh esh in commodity DRAM (DDRx) Bank 7 Bank 7 Time … Refr efresh esh Bank 1 Bank 1 Bank 0 Bank 0 Per-bank refresh allows accesses to other Per er-bank r -bank refr efresh esh in mobile DRAM (LPDDRx) banks while a bank is refreshing Round-r ound-robin or obin order der Bank 7 Bank 7 Time … Bank 1 Bank 1 Bank 0 Bank 0 5

  6. Shor Shortcomings of P omings of Per er-Bank R -Bank Refr efresh esh • Problem 1: Refreshes to different banks are scheduled in a strict round-robin order – The static ordering is hardwired into DRAM chips – Refreshes busy banks with many queued requests when other banks are idle • Key idea: Schedule per-bank refreshes to idle banks opportunistically in a dynamic order 6

  7. Shor Shortcomings of P omings of Per er-Bank R -Bank Refr efresh esh • Problem 2: Banks that are being refreshed cannot concurrently serve memory requests Dela elayed b ed by r y refr efresh esh Bank 0 Bank 0 Per er-Bank R -Bank Refr efresh esh RD RD Time 7

  8. Shor Shortcomings of P omings of Per er-Bank R -Bank Refr efresh esh • Problem 2: Refreshing banks cannot concurrently serve memory requests • Key idea: Exploit subar subarrays within a bank to parallelize refreshes and accesses across subar subarrays s Subarray 1 Subar y 1 RD RD Time Bank 0 Bank 0 Subar Subarray 0 y 0 Time Subar Subarray R y Refr efresh esh Par aralleliz allelize e 8

  9. Outline Outline • Motivation and Key Ideas • DR DRAM and R M and Refr efresh Backg esh Background ound • Our Mechanisms • Results 9

  10. DR DRAM S M Syst stem Or em Organiza ganization tion Rank 1 ank 1 Rank 0 ank 0 Bank 7 Bank 7 Rank 1 ank 1 … DRAM DR M Bank 1 Bank 1 Bank 0 Bank 0 • Banks can serve multiple requests in parallel 10

  11. DR DRAM R M Refr efresh F esh Frequenc equency y • DRAM standard requires memory controllers to send per periodic r iodic refr efreshes eshes to DRAM tRefLatency (tRFC): Varies based on DRAM chip density (e.g., 350ns) Read/Write: roughly 50ns Timeline tRefPeriod (tREFI): Remains constan onstant 11

  12. Incr ncreasing P easing Per erfor ormanc mance I e Impac mpact t • DRAM is unavailable to serve requests for tRefLatency of time tRefPeriod • 6.7% 6.7% for today’s 4Gb DRAM • Unavailability increases with higher density due to higher tRefLatency – 23% / 41% 23% / 41% for futur future 32Gb / 64Gb DR e 32Gb / 64Gb DRAM 12

  13. All-Bank vs. P All-Bank v . Per er-Bank R -Bank Refr efresh esh All-Bank Refresh: Employed in commodity DRAM (DDRx, LPDDRx) Read ead Bank 1 Refr efresh esh Timeline Refr efresh esh Staggered across Read ead Bank 0 Refr efresh esh banks to limit power Per-Bank Refresh: In mobile DRAM (LPDDRx) Bank 1 Timeline Read ead Refr efresh esh Bank 0 Refr efresh esh Read ead • Shorter tR tRefLa efLatenc ency than that of all-bank refresh Can serve memory accesses in parallel with • More frequent refreshes (shorter tR tRefP efPer eriod iod) refreshes across banks 13

  14. Shor Shortcomings of P omings of Per er-Bank R -Bank Refr efresh esh • 1) Per-bank refreshes are str stric ictly scheduled tly scheduled in round-robin order (as fixed by DRAM’s internal logic) • 2) A refr efreshing bank eshing bank cannot serve memory accesses Goal: Enable more parallelization between refreshes and accesses using practical mechanisms 14

  15. Outline Outline • Motivation and Key Ideas • DRAM and Refresh Background • Our Mechanisms – 1. Dynamic Access-Refresh Parallelization (DARP) – 2. Subarray Access-Refresh Parallelization (SARP) • Results 15

  16. Our F Our First Appr irst Approach: D oach: DARP ARP • Dynamic A ynamic Access-R ess-Refr efresh P esh Par aralleliza allelization (D tion (DARP) ARP) – An improved scheduling policy for per-bank refreshes – Exploits refresh scheduling flexibility in DDR DRAM • Component 1: Out Out-of- of-or order per der per-bank r -bank refr efresh esh – Avoids poor static scheduling decisions – Dynamically issues per-bank refreshes to idle banks • Component 2: Writ ite-R -Refr efresh P esh Par aralleliza allelization tion – Avoids refresh interference on latency-critical reads – Parallelizes refreshes with a ba a batch of wr ch of writ ites es 16

  17. 1) Out 1) Out-of- of-Or Order P der Per er-Bank R -Bank Refr efresh esh • Dynamic scheduling polic ynamic scheduling policy that prioritizes refreshes to idle banks • Memor emory c y con ontr trollers ollers decide which bank to refresh 17

  18. 1) Out-of- 1) Out of-Or Order P der Per er-Bank R -Bank Refr efresh esh Baseline: Round r Baseline: R ound robin obin Request queue (Bank 0) Request queue (Bank 1) Read Read Bank 1 Refr efresh esh Read ead Timeline Bank 0 Refr efresh esh Read ead Dela elayed b ed by r y refr efresh esh Reduces refresh penalty on demand requests Our mechanism: DARP Our mechanism: D ARP by refreshing idle banks first in a flexible order Saved c Sa ed cycles cles Bank 1 Refr efresh esh Read ead Bank 0 Read ead Refr efresh esh Sa Saved c ed cycles cles 18

  19. Outline Outline • Motivation and Key Ideas • DRAM and Refresh Background • Our Mechanisms – 1. Dynamic Access-Refresh Parallelization (DARP) • 1) Out-of-Order Per-Bank Refresh • 2) 2) Writ ite-R -Refr efresh P esh Par aralleliza allelization tion – 2. Subarray Access-Refresh Parallelization (SARP) • Results 19

  20. Refr efresh I esh Inter erfer erenc ence on Upc e on Upcoming R oming Requests equests • Problem: A refresh may collide with an upcoming request in the near future Bank 1 Read ead Time Bank 0 Refr efresh esh Dela elayed b ed by r y refr efresh esh Read ead 20

  21. DR DRAM M Writ ite Dr e Draining aining • Observations: • 1) Bus-tur Bus-turnar naround la ound latenc ency y when transitioning from writes to reads or vice versa – To mitigate bus-tur bus-turnar naround la ound latenc ency, writes are typically drained to DRAM in a batch during a period of time • 2) Writes are not la latenc ency-cr critical itical Turnaround Bank 1 Timeline Read ead Writ ite e Writ ite e Writ ite e Bank 0 21

  22. 2) 2) Writ ite-R -Refr efresh P esh Par aralleliza allelization tion • Proactively schedules refreshes when banks are serving wr writ ite ba e batches ches Baseline Baseline Turnaround Bank 1 Timeline Read ead Writ ite e Writ ite e Writ ite e Bank 0 Refr efresh esh Read ead Dela elayed b ed by r y refr efresh esh Avoids stalling latency-critical read requests Writ ite-r -refr efresh par esh paralleliza allelization tion by refreshing with non-latency-critical writes Turnaround Bank 1 Timeline Read ead Writ ite e Writ ite e Writ ite e Bank 0 Refr efresh esh Read ead Refr efresh esh 1. Postpone r 1. P ostpone refr efresh esh 2. R 2. Refr efresh dur esh during wr ing writ ites es Sa Saved c ed cycles cles 22

  23. Outline Outline • Motivation and Key Ideas • DRAM and Refresh Background • Our Mechanisms – 1. Dynamic Access-Refresh Parallelization (DARP) – 2. Subarray Access-Refresh Parallelization (SARP) • Results 23

  24. Our S Our Sec econd Appr ond Approach: SARP oach: SARP Observations: 1. A bank is further divided into subar subarrays s – Each has its own row buffer to perform refresh operations Bank 7 Bank 7 … Subar Subarray y Bank 1 Bank 1 Row Bu w Bufffer er Bank I/O Bank I/O Bank 0 Bank 0 Idle Idle 2. Some subar subarrays and bank I/O bank I/O remain completely idle idle during refresh 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend