[PPT] - Memory Refresh Kate Nguyen, Kehan Lyu, Xianze Meng , Vilas Sridharan, PowerPoint Presentation

SLIDE 1

Nonblocking Memory Refresh

Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian

SLIDE 2

History of DRAM

1968

DRAM is patented

2018

50th Anniversary of DRAM patent

2003

DDR2

2007

DDR3

2015

550 0.75 13.5 0.5 16 512

Latency (ns)

Refresh Latency Bus Cycle Time

Min. Read Latency

2012 2014

DDR4

2000

DDR

Skipping Refresh

(ISCA ‘12, HPCA ‘13 HPCA ’14, ISCA ’15, ISCA ’17, MICRO ‘17 )

2013 2017

2

SLIDE 3

Issues with Skipping Refresh

Skipping refresh reduces memory security

Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, K. Lai, and O. Mutlu, “Flipping bits in memory without

accessing them: An experimental study of dram disturbance errors,” in 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), pp. 361–372, June 2014.

Tested DRAM chips from different manufacturers

3

Memory Cell Refresh Interval (ms)

SLIDE 4

Why DRAM Refresh Hurts Performance

DRAM

bit line storage capacitor address line transistor word line T1 T2 bit bit

SRAM

T4 T3 T6 T5

4 Blocking Refresh Nonblocking Refresh

SLIDE 5

Our Proposal: Nonblocking Refresh

Improve performance while retaining the same level of security

as the conventional baseline.

Transform DRAM refresh into the static/background refresh in

SRAM at the system level.

Refresh DRAM in the background without stalling read accesses

to refreshing memory blocks. 5

SLIDE 6

How Nonblocking Refresh Works

Refreshing Memory Block Pending read requests to the block are stalled

Conventional Refresh Nonblocking Refresh

Refreshing Data Redundant Data Calculate

6

Refreshing Memory Block

SLIDE 7

Leveraging Existing Redundant Data for Free

Redundant Data (12.5% - 40.6%) Program Data For hardware failure protection 97% 0% 20% 40% 60% 80% 100% 1 2 3 4 5 6 7 Avg

% of pages that remain fault-free, on average Year of operation

For server systems, Nonblocking Refresh can leverage existing underutilized redundant data without storage overheads.

Each memory block in server memory

7

SLIDE 8

Primer on Server Memory Organization

Example Memory Rank

Data chip1 Data chip 2 Data chip 3 Redundant Chip 1 Redundant Chip 2 Data chip 4

Fetched Memory Block from Rank

8

SLIDE 9

Nonblocking Refresh for Server Memory

Example Memory Rank

Data chip1 Data chip 2 Data chip 3 Redundant Chip 1 Redundant Chip 2 Data chip 4

Fetched Memory Block from Rank

Inaccessible data due to refresh Accessible data

Calculate

9

SLIDE 10

Challenge 1: Ensuring Same Amount of Refresh

Chip ID

Refreshing (inaccessible) Not refreshing (accessible)

Conventional (blocking) refresh Refreshing Memory Rank

Data chip1 Data chip 2 Data chip 3 Redundant Chip 1 Redundant Chip 2 Data chip 4

Time

1 2 3 4 5 6

10

SLIDE 11

Challenge 1: Ensuring Same Amount of Refresh

Chip ID

Refreshing (inaccessible) Not refreshing (accessible)

Nonblocking Refresh Refreshing Memory Rank

Data chip1 Data chip 2 Data chip 3 Redundant Chip 1 Redundant Chip 2 Data chip 4

Time

1 2 3 4 5 6

11

SLIDE 12

Challenge 1: Ensuring Same Amount of Refresh

1

Time Chip ID

2 3

Refreshing (inaccessible) Not refreshing (accessible)

Refreshing Memory Rank

Data chip1 Data chip 2 Data chip 3 Redundant Chip 1 Redundant Chip 2 Data chip 4

4 5 6

Nonblocking Refresh

12

Refresh Interval

SLIDE 13

Challenge 1: Ensuring Same Amount of Refresh

Chip ID

Refreshing (inaccessible) Not refreshing (accessible)

Refreshing Memory Rank

Data chip1 Data chip 2 Data chip 3 Redundant Chip 1 Redundant Chip 2 Data chip 4

Time

1 2 3 4 5 6

Nonblocking Refresh

13

Refresh Interval

SLIDE 14

Challenge 2: Ensuring Memory Write Bandwidth

Conventional Systems Nonblocking Refresh

Processor Rank 1 Rank 2

100%

Rank N

Write Queue

... Shared Memory Bus

...

Processor Rank 1 Rank 2

100%

Rank N

Write Queue

Shared Memory Bus

0% 100% 0%

...

36 KB/Channel Writeback Cache

Refreshing

100%/N 100%/N 100%/N

14

SLIDE 15

Challenge 3: Preserving Baseline Hardware Failure Protection

Read a block from a refreshing rank Hardware Error detected ? Read completes Wait for refresh to complete NO Use the block’s existing redundant data: to calculate inaccessible data stored in refreshing chips + to detect unknown hardware errors Perform error correction Re-read block from memory YES Read completes

15

SLIDE 16

Methodology

Two Memory Systems:
Intel/AMD Server Memory Systems
IBM Server Memory System
Baseline:
Conventional Refresh: fully compliance with manufacturer

specification

Insecure Refresh: skips 75% of refresh operations
Evaluated 7 multi-threaded and 7 multi-program workloads
16gb and future 32gb DRAM
4 memory channels with 4 ranks per channel

16

SLIDE 17

Performance Improvement

10%
5%

0% 5% 10% 15% 20% 25% 30% 35% 40%

Intel/AMD Server Mem IBM Server Mem Intel/AMD Server Mem IBM Server Mem 16Gb 32Gb

Performance Improvement vs. Conventional Refresh

17

SLIDE 18

Performance Improvement

18

10%
8%
6%
4%
2%

0% 2% 4% 6% 8% 10%

Intel/AMD Server Mem IBM Server Mem Intel/AMD Server Mem IBM Server Mem 16Gb 32Gb

Performance Improvement vs. Insecure Refresh

SLIDE 19

Power Consumption

19

5%
3%
1%

1% 3% 5% 7% 9%

Intel/AMD Server Mem IBM Server Mem Intel/AMD Server Mem IBM Server Mem 16Gb 32Gb

Power

vs. Conventional Refresh
vs. Insecure Refresh

SLIDE 20

Performance of Systems with Faulty Chips

20

80% 82% 84% 86% 88% 90% 92% 94% 96% 98% 100%

Intel/AMD Server Mem IBM Server Mem Intel/AMD Server Mem IBM Server Mem 16GB 32GB Nonblocking Refresh on faulty systems

vs. on fault-free systems

3 Faulty Ranks/Channel 2 Faulty Ranks/Channel 1 Faulty Rank/Channel Average

SLIDE 21

Conclusion

Since its invention 50 years ago, DRAM has always required expensive refresh
perations that stall accesses to refreshing data.
We propose Nonblocking Refresh to refresh data in DRAM without stalling

read accesses to refreshing data.

For server memory systems, Nonblocking Refresh improves average

performance by 16.2% and 30.3% for 16gb and 32gb chips, respectively.

Nonblocking Refresh preserves conventional baseline level of security by

ensuring the same amount of refresh.

21

SLIDE 22

Questions?

22