the colored refresh server for dram
play

The Colored Refresh Server for DRAM Xing Pan, Frank Mueller North - PowerPoint PPT Presentation

The Colored Refresh Server for DRAM Xing Pan, Frank Mueller North Carolina State University North Carolina State University 1 Real-time system Real-Time System requires: Logical Correctness: Produces correct outputs. Temporal


  1. The Colored Refresh Server for DRAM Xing Pan, Frank Mueller North Carolina State University North Carolina State University 1

  2. Real-time system � Real-Time System requires: — Logical Correctness: Produces correct outputs. — Temporal Correctness: Produces outputs at the right time. � Real-time task � Real-time task — predict its worst-case execution time — schedule it to meet its deadline WCET 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 = job deadline = job release 2

  3. NUMA Architecture � Modern NUMA (non-uniform memory access) architectures: — CPU partitions sets of cores into “node”: 1 local + several remote controllers — Each memory controller (node) consists of multilevel resources (channel, rank and bank) 3

  4. Core Isolation � Hard Real-Time Composition � Challenge: shared resources — One core execution affects other cores � Objective: Isolate cores — Allows compositional timing analysis � Application: mission critical hard real-time � Application: mission critical hard real-time — Automated driving… 4

  5. DRAM Organization � DRAM bank array has: rows+columns of data cells � Load the row which contains requested data into Row Buffer — Row Buffer hit vs. Row Buffer miss 5

  6. Memory Controller � DRAM banks can be accessed in parallel 6

  7. Motivation � Apps on NUMA arch. experience varying execution times due to — Remote memory node accesses — Conflict in memory banks/controllers 7

  8. Past: Memory Predictability by Coloring � Local node policy under standard buddy allocation / numa library — Not bank aware — numa library only works on heap memory � Previous Work — Our Controller-Aware Memory Coloring (CAMC) @ SAC’18 — NUMA causes unpredictable — NUMA causes unpredictable execution time — New memory allocator in kernel via mmap() syscall, no hardware modifications — Each task gets private memory (coloring) on local NUMA node — Avoid remote refs, bank conflicts � predictable exec., lower performance, lower utilization 8

  9. Memory Frame Color Selection channel Physical Address 0 1516 17 18 19 20 31 rank bank � Bank color ( bc ) of a physical page bc = ((node � NN � NC+channel) � NR+rank) � NB+bank � � � � — NN: # nodes (mem controllers) of a system — NC: # channels per controller — NR: # ranks per channel — NB: # banks per rank � Opteron 6128: NN=4, NC=2, NR=2, NB=8, Total of 128 colors � Example: page in node 0, channel 1, rank 1 and bank 2 � color is ((0 � 4 � 2+1) � 2+1)*8+2=26 9

  10. Focus in this Paper: DRAM Refresh � Dynamic Random Access Memory (DRAM) — data is stored in the capacitor as 1 or 0 (electrically charged/discharged) — capacitors slowly leak their charge over time — requires cells to be refreshed, otherwise data would be lost. 10

  11. Unpredictability due to DRAM Refresh � Refresh commands to all DRAM cells periodically issued by DRAM controller to maintain data validity. — row-buffer is closed — any memory access deferred until refresh completes � Distributed Refresh vs. Burst refresh 11

  12. Unpredictability due to DRAM Refresh � Refresh commands to all DRAM cells periodically issued by DRAM controller to maintain data validity — row-buffer is closed — any memory access deferred until refresh completes � Distributed Refresh vs. Burst refresh Retention Time (tRET) Retention Time (tRET) tRFC tREFI 12

  13. DRAM Refresh Trends: It’s getting worse � tRET: 64 ms / 32 ms. determined by temperature (85 C) � tRFC increases quickly with growing DRAM densities Chip Density # banks #rows/bank #rows/bin tRFC 1Gb 8 16K 16 110 ns [1] 2Gb 2Gb 8 8 32K 32K 32 32 160 ns [1] 160 ns [1] 4Gb 8 64K 64 260 ns [1] 8Gb 8 128K 128 350 ns [1] 16Gb 8 256K 256 550 ns [2] 32Gb 8 512K 512 > 1 us [3] 64Gb 8 1M 1K > 2 us [3] [1] Standard, JEDEC, DDR3 SDRAM � [2] Standard, JEDEC, DDR4 SDRAM � [3] Jamie Liu, Onur Mutlu et al. "RAIDR: Retention-aware intelligent DRAM � refresh." ACM SIGARCH Computer Architecture News . 2012. 13

  14. Challenge: Refresh Delay � Auto-refresh : recharges all the memory cells within the “retention time” — a rank during refresh becomes unavailable to memory requests until the refresh completes (tRFC). — all bank row buffers of this rank closed (tRP) and need to be re-opened (tRAS) re-opened (tRAS) — More bank row buffer misses around refreshes. 14

  15. Challenge: Refresh Delay � Auto-refresh : recharges all the memory cells within the “retention time” — a rank during refresh becomes unavailable to memory requests until the refresh completes (tRFC). — all bank row buffers of this rank closed (tRP) and need to be re-opened (tRAS) re-opened (tRAS) — More bank row buffer misses around refreshes. 1. Increase in memory latency 2. Significant fluctuation of memory reference latency. 15

  16. Challenge: Refresh Delay � As density and size of DRAM grow: — more rows required per DRAM chip — longer tRFC — higher probability for refresh interference 16

  17. Challenge: Refresh Delay � As density and size of DRAM grow: — more rows required per DRAM chip — longer tRFC — higher probability for refresh interference 1. Increases length a refresh operation 2. Reduces memory throughput 17

  18. Solution: Colored Refresh Server (CRS) � Partition DRAM memory at rank granularity — Refreshes rotate round-robin from rank to rank — Assign real-time tasks to different ranks via colored memory allocation (say: green,blue) — Schedule 2 server tasks to refresh green/blue memory — Ensure that no blue task runs when green server active — Ensure that no blue task runs when green server active and vice versa: no green task runs when blue server active and vice versa: no green task runs when blue server active � Cooperative scheduling real-time tasks and refresh operations � memory requests no longer suffer from refresh interference 18

  19. Architecture of Colored Refresh Server � Hierarchical model — System Level − Refresh tasks w/ static priority: Refresh Tasks > S 1 > S 2 tasks — Server Level (inside the servers) − User tasks scheduled inside servers − w/ memory colored diametric to server − w/ memory colored diametric to server − with any real-time scheduling policy: EDF, RM, … − Refresh Lock/unlock tasks: no memory blocking during refresh Refresh Lock/Unlock Tasks … … 19

  20. Refresh Lock and Unlock Tasks � partition entire DRAM space into two “colors” — e.g., c 1 (k 0 , k 1 ... k i ), and c 2 (k i+1 , k i+2 ... k K-1 ). � refresh lock tasks, and — period of tRET(64ms) — trigger refresh for c 1 (green) and c 2 (blue), respectively � refresh unlock tasks, and � refresh unlock tasks, and — update corresponding color to be available once refresh finishes 20

  21. Server Model � Server model, S(W,A, c, p s , e s ) — with CPU time as resource — Where: − W is the workload model (applications) − A is the scheduling algorithm, e.g., EDF or RM − c denotes the memory color assigned to this server, i.e., a − c denotes the memory color assigned to this server, i.e., a set of memory ranks available for allocation − p s is the server period − e s is the server budget 21

  22. Server Model � Set execution budget to e s at time instants k * p s , where k > 0. � Any unused execution budget cannot be carried over to next period � The refresh server can execute when — (i) its budget is not zero; — (i) its budget is not zero; — (ii) its available task queue is not empty; and — (iii) its memory color is not locked by a “refresh task” (introduced above). — Otherwise, it remains suspended. 22

  23. Example of CRS � T 1 (16ms, 4ms) T 2 (16ms, 2ms) T 3 (32ms, 8ms) T 4 (64ms, 8ms) � S 1 ((T 1 , T 2 ), RM, c 1 (k 0 ,k 1 ,k 2 ,k 3 ), 16ms , 6ms ) S ((T , T ), RM, c (k ,k ,k ,k ), 16ms , 6ms ) S 2 ((T 3 , T 4 ), RM, c 2 (k 4 ,k 5 ,k 6 ,k 7 ), 16ms , 6ms) � Phases φ of S 1 and S 2 are tRET/2 and 0, respectively — i.e., S 2 (colors c 2 ) refreshed first 23

  24. Example of CRS 24

  25. Schedulability Analysis within a Server � Given a server S(W,A, c , p s , e s ) [SL03], — Periodic Capacity Bound (PCB): − bound period (p s ) and deadline (e s ) − with workload (W) and algorithm (A) — Utilization Bound (UB) − Bound utilization of workload − Bound utilization of workload − with p s , e s , and A � [SL03] Shin, I. & Lee, I. “Periodic resource model for compositional real-time guarantees”. RTSS. 2003. Refresh Lock/Unlock Tasks … … 25

  26. Schedulability Analysis � Servers + refresh lock/unlock tasks at system level � Time Demand Analysis — Refresh tasks w/ static priority: Lock/Unlock Tasks > S 1 > S 2 Refresh Lock/Unlock Tasks … … 26

  27. Colored Refresh Server Design � Off-line algorithm — Searches entire range of available configurations — Find minimum refresh overhead & budgets for servers — Short tasks: create copy tasks — See dissertation [Pan’18] � Colored Refresh Server — Guarantees schedulability (if task set was schedulable w/o CRS) — Cost much lower overhead than auto-refresh (removes entire refresh overhead in most cases) 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend