Flipping Bits in Memory Without Accessing Them: An Experimental - - PowerPoint PPT Presentation

flipping bits in memory without accessing them an
SMART_READER_LITE
LIVE PREVIEW

Flipping Bits in Memory Without Accessing Them: An Experimental - - PowerPoint PPT Presentation

Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors ISCA 2014 Yoongu Kim 1 Ross Daly 1 Jeremie Kim 1 Chris Fallin 1 Ji Hye Lee 1 Donghyuk Lee 1 Chris Wilkerson 2 Konrad Lai Onur Mutlu 1 1 Carnegie


slide-1
SLIDE 1

Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors

ISCA 2014 Yoongu Kim1 Ross Daly1 Jeremie Kim1 Chris Fallin1 Ji Hye Lee1 Donghyuk Lee1 Chris Wilkerson2 Konrad Lai Onur Mutlu1

1Carnegie Mellon University 2Intel Labs

Presented by Sam Schiferl and Pedram Zamirai

slide-2
SLIDE 2

Outline

1. Motivation 2. DRAM Structure 3. Disturbance Errors 4. Test System Setup 5. Results 6. Proposed Solution 7. Conclusion 8. Discussion

2

slide-3
SLIDE 3

Motivation

  • As DRAM process technology continues to downscale, memory reliability

suffers due to:

○ Smaller cell holds limited charge ○ Cells are closer together, which can lead to electromagnetic coupling ○ Higher variation in process technology

  • These issues can lead to the violation of memory isolation

○ An access to one memory address should not have unintended side effects on data stored in other addresses

  • The authors investigate the vulnerability of three major commodity DRAM

manufacturers to targeted disturbance error attacks

3

slide-4
SLIDE 4

DRAM Structure

Single memory cell1 Rows of cells

1 Figure from paper

  • Charge stored in capacitor to

represent 0/1

  • Access transistor used to read/write

data to specific cell

4

slide-5
SLIDE 5

DRAM Access

Single memory cell1 Rows of cells

1 Figure from paper

1. Row’s wordline is raised to high 2. Row-buffer reads/write desired columns 3. Row’s wordline is closed

5

slide-6
SLIDE 6

DRAM Access

Single memory cell1 Rows of cells

1 Figure from paper

1. Row’s wordline is raised to high 2. Row-buffer reads/write desired columns 3. Row’s wordline is closed

6

slide-7
SLIDE 7

DRAM Access

Single memory cell1 Rows of cells

1 Figure from paper

1. Row’s wordline is raised to high 2. Row-buffer reads/write desired columns 3. Row’s wordline is closed

7

slide-8
SLIDE 8

DRAM Access

Single memory cell1 Rows of cells

1 Figure from paper

1. Row’s wordline is raised to high 2. Row-buffer reads/write desired columns 3. Row’s wordline is closed

8

slide-9
SLIDE 9

DRAM Access

Single memory cell1 Rows of cells

1 Figure from paper

1. Row’s wordline is raised to high 2. Row-buffer reads/write desired columns 3. Row’s wordline is closed

9

slide-10
SLIDE 10

DRAM Refresh

  • The charge of a memory cell

constantly leaks, eventually leading to a loss of data

  • Data must be refreshed

periodically by raising the wordline

  • DRAM specifications guarantee a

retention time before the cell loses data

○ 64 ms retention time for DDR3

10

slide-11
SLIDE 11

DRAM Refresh

  • The charge of a memory cell

constantly leaks, eventually leading to a loss of data

  • Data must be refreshed

periodically by raising the wordline

  • DRAM specifications guarantee a

retention time before the cell loses data

○ 64 ms retention time for DDR3

11

slide-12
SLIDE 12

Disturbance Errors

  • Unwanted interaction between two

isolated circuit components

  • Repeatedly toggling the voltage of a

wordline can cause cells in nearby rows to leak charge at a faster rate - leak entire charge prior to refresh

  • Causes:

○ Noise injection ○ Bridges ○ Hot-carrier injection Aggressor Victims Victims

12

slide-13
SLIDE 13

Disturbance Error Attack

  • Repeatedly read data from same row in DRAM and track bit flips in other

DRAM rows

  • Flush line from cache after each read

mov (X), %eax mov (Y), %ebx clflush (X) clflush (Y) mfence jmp code1a X & Y map to the same bank, but different rows mov (X), %eax clflush (X) mfence jmp code1a Induces errors Does not induce errors

13

slide-14
SLIDE 14

Experimental Methodology

  • Testing platform

○ 8 Xilinx FPGA boards ○ DDR3-800 memory controller ○ Run at 50฀C

  • DRAM modules

○ 129 DDR3 DRAM modules ○ 972 DRAM chips

  • Test Parameters

○ Activation Interval (AI) ○ Refresh Interval (RI) ○ Data Pattern (DP)

14

slide-15
SLIDE 15

Types of Tests

1. Toggle all lines in module repeatedly and locate all disturbed cells

○ Quickly identify all disturbed cells throughout an entire module

2. Toggle single row repeatedly and identify specific disturbed cells

○ Correlate victim cells with aggressor rows

15

slide-16
SLIDE 16

Manufacturing Date

  • No error in 19 oldest modules
  • Relatively recent phenomenon

16

slide-17
SLIDE 17

Effective Parameters

  • Access patterns

○ Repeated toggling of wordline ○ Opening & closing cause the problem

  • Refresh interval (RI)
  • Activation interval (AI)
  • Data Patterns

Access Pattern Disturbance Errors? (open-read-close)N Yes (open-write-close)N Yes

  • pen-readN-close

No

  • pen-writeN-close

No

17

slide-18
SLIDE 18

Effective Parameters

  • Access patterns
  • Refresh interval (RI)

○ RI ↓ ⇒ Errors ↓ ■ Less leakage ■ Less row openings

  • Activation interval (AI)
  • Data Patterns

18

slide-19
SLIDE 19

Effective Parameters

  • Access patterns
  • Refresh interval (RI)
  • Activation interval (AI)

○ AI ↑ ⇒ Errors ↓ ■ Less row openings in each RI

  • Data Patterns

19

slide-20
SLIDE 20

Effective Parameters

  • Access patterns
  • Refresh interval (RI)
  • Activation interval (AI)
  • Data Patterns

○ Victim cells lose charge when they are disturbed ○ True-cell: High voltage = 1 ○ Anti-cell: High voltage = 0 ○ True is dominant ○ Errors are mostly 1 → 0

20

slide-21
SLIDE 21

Address Correlation

  • No errors in aggressor itself
  • Strong peaks at ±1

○ Great effect on two immediate neighbor ○ Logical and physical adjacency highly correlate

  • Errors in non-adjacent rows

○ Physically-adjacent ⇎ Logically-adjacent

21

slide-22
SLIDE 22

Sensitivity Results

  • Errors are mostly repeatable

○ Ten iterations of testing ○ Relatively constant average number of errors (±0.25%)

  • Victim cells ≠ Weak cells

○ Weak cells = cells with shortest retention time

  • Not strongly affected by temperature

○ ±20฀C from ambient temperature → No effect

22

slide-23
SLIDE 23

Probabilistic Adjacent Row Activation (PARA)

  • After closing a row, memory controller might refresh one of the adjacent

rows by probability of P (small constant)

○ Stateless solution

  • It picks one of the neighbors randomly
  • Number of accesses ↑ ⇒ Refresh Probability ↑
  • Cannot prevent disturbance errors with absolute certainty

23

slide-24
SLIDE 24

Conclusion

  • Demonstrated, characterized and analyzed disturbance errors
  • Repeated accesses to the same row corrupts data in other rows
  • Emerging problem (affect current and future computing systems)
  • Proposed several solutions

24

slide-25
SLIDE 25

Discussion Points

  • Does the type of processor (ARM vs x86) have an effect on the feasibility
  • f the attack?

25

slide-26
SLIDE 26

Discussion Points

  • Does the type of processor (ARM vs x86) have an effect on the feasibility
  • f the attack?
  • How practical is their PARA solution that relies on probabilistically

refreshing candidate victim rows?

26

slide-27
SLIDE 27

Discussion Points

  • Does the type of processor (ARM vs x86) have an effect on the feasibility
  • f the attack?
  • How practical is their PARA solution that relies on probabilistically

refreshing candidate victim rows?

  • Should this attack be mitigated with a software or a hardware solution?

27

slide-28
SLIDE 28

Potential Solutions

Solution Probable Defect Make better chips Future smaller cells Correct errors High cost & unable to correct multi-bit errors Refresh all rows frequently Degrade performance and energy efficiency Map faulty cells to spare cells (manufacturer) Not enough spare cells Retire cells (end-user)

1. Disable/remap faulty addresses 2. Refresh faulty addresses more frequently

1: Every row in the module is a victim row 2: refreshes victim rows more frequently even when there is no access to the module Identify “hot” rows and refresh neighbors High hardware overhead to identify hot rows

28