Nonblocking Memory Refresh
Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian
Memory Refresh Kate Nguyen, Kehan Lyu, Xianze Meng , Vilas Sridharan, - - PowerPoint PPT Presentation
Nonblocking Memory Refresh Kate Nguyen, Kehan Lyu, Xianze Meng , Vilas Sridharan, Xun Jian History of DRAM 2 Refresh Latency Bus Cycle Time Min. Read Latency 550 512 Latency (ns) 16 13.5 0.75 0.5 2014 2007 2000 2003 1968 2018
Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian
History of DRAM
1968
DRAM is patented
2018
50th Anniversary of DRAM patent
2003
DDR2
2007
DDR3
2015
550 0.75 13.5 0.5 16 512
Latency (ns)
Refresh Latency Bus Cycle Time
2012 2014
DDR4
2000
DDR
Skipping Refresh
(ISCA ‘12, HPCA ‘13 HPCA ’14, ISCA ’15, ISCA ’17, MICRO ‘17 )
2013 2017
2
Issues with Skipping Refresh
Skipping refresh reduces memory security
accessing them: An experimental study of dram disturbance errors,” in 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), pp. 361–372, June 2014.
Tested DRAM chips from different manufacturers
3
Memory Cell Refresh Interval (ms)
Why DRAM Refresh Hurts Performance
DRAM
bit line storage capacitor address line transistor word line T1 T2 bit bit
SRAM
T4 T3 T6 T5
4 Blocking Refresh Nonblocking Refresh
Our Proposal: Nonblocking Refresh
as the conventional baseline.
SRAM at the system level.
to refreshing memory blocks. 5
How Nonblocking Refresh Works
Refreshing Memory Block Pending read requests to the block are stalled
Conventional Refresh Nonblocking Refresh
Refreshing Data Redundant Data Calculate
6
Refreshing Memory Block
Leveraging Existing Redundant Data for Free
Redundant Data (12.5% - 40.6%) Program Data For hardware failure protection 97% 0% 20% 40% 60% 80% 100% 1 2 3 4 5 6 7 Avg
% of pages that remain fault-free, on average Year of operation
For server systems, Nonblocking Refresh can leverage existing underutilized redundant data without storage overheads.
Each memory block in server memory
7
Primer on Server Memory Organization
Example Memory Rank
Data chip1 Data chip 2 Data chip 3 Redundant Chip 1 Redundant Chip 2 Data chip 4
Fetched Memory Block from Rank
8
Nonblocking Refresh for Server Memory
Example Memory Rank
Data chip1 Data chip 2 Data chip 3 Redundant Chip 1 Redundant Chip 2 Data chip 4
Fetched Memory Block from Rank
Inaccessible data due to refresh Accessible data
Calculate
9
Challenge 1: Ensuring Same Amount of Refresh
Chip ID
Refreshing (inaccessible) Not refreshing (accessible)
Conventional (blocking) refresh Refreshing Memory Rank
Data chip1 Data chip 2 Data chip 3 Redundant Chip 1 Redundant Chip 2 Data chip 4
Time
1 2 3 4 5 6
10
Challenge 1: Ensuring Same Amount of Refresh
Chip ID
Refreshing (inaccessible) Not refreshing (accessible)
Nonblocking Refresh Refreshing Memory Rank
Data chip1 Data chip 2 Data chip 3 Redundant Chip 1 Redundant Chip 2 Data chip 4
Time
1 2 3 4 5 6
11
Challenge 1: Ensuring Same Amount of Refresh
1
Time Chip ID
2 3
Refreshing (inaccessible) Not refreshing (accessible)
Refreshing Memory Rank
Data chip1 Data chip 2 Data chip 3 Redundant Chip 1 Redundant Chip 2 Data chip 4
4 5 6
Nonblocking Refresh
12
Refresh Interval
Challenge 1: Ensuring Same Amount of Refresh
Chip ID
Refreshing (inaccessible) Not refreshing (accessible)
Refreshing Memory Rank
Data chip1 Data chip 2 Data chip 3 Redundant Chip 1 Redundant Chip 2 Data chip 4
Time
1 2 3 4 5 6
Nonblocking Refresh
13
Refresh Interval
Challenge 2: Ensuring Memory Write Bandwidth
Conventional Systems Nonblocking Refresh
Processor Rank 1 Rank 2
100%
Rank N
Write Queue
... Shared Memory Bus
...
Processor Rank 1 Rank 2
100%
Rank N
Write Queue
Shared Memory Bus
0% 100% 0%
...
36 KB/Channel Writeback Cache
Refreshing
100%/N 100%/N 100%/N
14
Challenge 3: Preserving Baseline Hardware Failure Protection
Read a block from a refreshing rank Hardware Error detected ? Read completes Wait for refresh to complete NO Use the block’s existing redundant data: to calculate inaccessible data stored in refreshing chips + to detect unknown hardware errors Perform error correction Re-read block from memory YES Read completes
15
Methodology
specification
16
Performance Improvement
0% 5% 10% 15% 20% 25% 30% 35% 40%
Intel/AMD Server Mem IBM Server Mem Intel/AMD Server Mem IBM Server Mem 16Gb 32Gb
Performance Improvement vs. Conventional Refresh
17
Performance Improvement
18
0% 2% 4% 6% 8% 10%
Intel/AMD Server Mem IBM Server Mem Intel/AMD Server Mem IBM Server Mem 16Gb 32Gb
Performance Improvement vs. Insecure Refresh
Power Consumption
19
1% 3% 5% 7% 9%
Intel/AMD Server Mem IBM Server Mem Intel/AMD Server Mem IBM Server Mem 16Gb 32Gb
Power
20
80% 82% 84% 86% 88% 90% 92% 94% 96% 98% 100%
Intel/AMD Server Mem IBM Server Mem Intel/AMD Server Mem IBM Server Mem 16GB 32GB Nonblocking Refresh on faulty systems
3 Faulty Ranks/Channel 2 Faulty Ranks/Channel 1 Faulty Rank/Channel Average
Conclusion
read accesses to refreshing data.
performance by 16.2% and 30.3% for 16gb and 32gb chips, respectively.
ensuring the same amount of refresh.
21
22