Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and - - PowerPoint PPT Presentation

flexible cache error protection using an ecc fifo
SMART_READER_LITE
LIVE PREVIEW

Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and - - PowerPoint PPT Presentation

Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and Mattan Erez Dept. Electrical and Computer Engineering The University of Texas at Austin 1 SC09 ECC FIFO Goal: to reduce on-chip ECC overhead Two-tiered error


slide-1
SLIDE 1

SC’09

Flexible Cache Error Protection using an ECC FIFO

Doe Hyun Yoon and Mattan Erez

  • Dept. Electrical and Computer Engineering

The University of Texas at Austin

1

slide-2
SLIDE 2

SC’09

ECC FIFO

  • Goal: to reduce on-chip ECC overhead

– Two-tiered error protection

  • T1EC: light-weight on-chip error code
  • T2EC: strong error correcting code

– Off-load T2EC overhead to FIFO in DRAM

  • Why FIFO? It’s simple to manage
  • 15-25% LLC area reduction
  • 10-17% LLC power saving
  • Just 1% performance penalty

2

slide-3
SLIDE 3

SC’09

BACKGROUND

3

slide-4
SLIDE 4

SC’09

Error Correcting Codes

  • 1-bit parity for error detection
  • SEC-DED (Hamming) codes

– Single-bit Error Correction and Double-bit Error Detection – 8bit ECC for 64bit data

  • DEC-TED

– Double-bit Error Correction and Triple-bit Error Detection – 15bit ECC for 64bit data

4

slide-5
SLIDE 5

SC’09

Interleaving

  • To detect and correct burst errors

– N-way interleaving converts an N-bit burst error to N single-bit errors

5

. . . … … … N

Error code 0

2N 1 N+1 2N+1

Error code 1

2 N+2 2N+2

Error code 2

N-1 2N-1

Error code N-1

slide-6
SLIDE 6

SC’09

Interleaving

  • To detect and correct burst errors

– N-way interleaving converts an N-bit burst error to N single-bit errors

6

. . . . . . … … … N

Error code 0

2N 1 N+1 2N+1

Error code 1

2 N+2 2N+2

Error code 2

N-1 2N-1

Error code N-1

slide-7
SLIDE 7

SC’09

Interleaving

  • To detect and correct burst errors

– N-way interleaving converts an N-bit burst error to N single-bit errors

  • Baseline cache error protection

– 8 way interleaved SEC-DED

  • Can correct up to 8-bit burst errors
  • 8B ECC per 64B cache line

7

slide-8
SLIDE 8

SC’09

Uniform Error Protection

8

. . .

. . . . . . . . . Data ECC ... ... 211 sets 8 ways 64B 8B . . . Tag

ECC increases area AND leakage/dynamic power

slide-9
SLIDE 9

SC’09

RELATED WORK

9

slide-10
SLIDE 10

SC’09

Soft Errors: Observations

  • Still, Soft Error Rate (SER) is low

– Every cache access tries to detect errors, but finds no error in most cases

  • Error Detection – Common case

– Need a low cost, low overhead error detection mechanism

  • Error Correction – Uncommon case

– Correction can be slow – But, still need to maintain error correction info somewhere

  • Memory hierarchy provides redundancy

inherently for clean data

– Only dirty lines need error correcting codes

10

slide-11
SLIDE 11

SC’09

PERC [Sorin’06] / Energy Efficient [Li’04]

11

. . .

... 211 sets 8 ways . . . Tag . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

Data EDC ... 64B 1B . . . . . . . . . 7B ECC

Read only Data and EDC – saves dynamic power Power gate ECC of clean lines – saves static power

slide-12
SLIDE 12

SC’09

Area Efficient [Kim’06]

12

212 sets 4 ways . . . Tag

. . . . . . . . .

. . . . . . Data EDC . . . ECC 212 sets 64B 1B 8B

Allow only 1 dirty line per set

slide-13
SLIDE 13

SC’09

MAXn scheme

13

. . .

... 211 sets 8 ways . . . Tag

. . .

. . . . . .

. . . . . . . . .

. . . . . . Data EDC ... . . . ECC 211 sets n ways ... 64B 1B 8B (n<8)

. . .

. . . ... Tag

Allow only n dirty lines per set May cause detrimental cleaning traffic

slide-14
SLIDE 14

SC’09

Two-tiered error protection

  • Tier-1 Error Code (T1EC)

– On-chip light-weight error code – Uniform error protection

  • Tier-2 Error Code (T2EC)

– Strong error codes only for dirty lines – Corrects Detected but Uncorrected Errors (DUE)

  • f T1EC

14

slide-15
SLIDE 15

SC’09

Memory Mapped ECC [Yoon’09]

15

. . .

... 8 ways . . . Tag . . . . . .

. . . . . . . . . . . . . . . . . .

Data ... 64B . . . . . . . . . 1B 8B ... T1EC T2EC On-Chip DRAM 211 sets

T2EC is memory mapped AND cached

slide-16
SLIDE 16

SC’09

ECC FIFO

16

slide-17
SLIDE 17

SC’09

ECC FIFO

  • Use Two-tiered error protection
  • T2EC is off-loaded to FIFO in DRAM

– LLC caching behavior is unaffected

  • FIFO

– Simple to manage

  • Coalesce buffer

– To better utilize DRAM channel

17

slide-18
SLIDE 18

SC’09 18

Last Level Cache DRAM Rest of cache hierarchy

. . .

ECC FIFO . . . Data T1EC

. . . . . . . . . . . . . . . . . .

. . . T2EC encoder Coalesce Buffer

slide-19
SLIDE 19

SC’09 19

Last Level Cache DRAM Rest of cache hierarchy

. . .

ECC FIFO . . . Data T1EC

. . . . . . . . . . . . . . . . . .

. . . T2EC encoder Dirty line eviction to LLC Coalesce Buffer

slide-20
SLIDE 20

SC’09 20

Last Level Cache DRAM Rest of cache hierarchy

. . .

ECC FIFO . . . Data T1EC

. . . . . . . . . . . . . . . . . .

. . . T2EC encoder TAG T2EC Encode T2EC and TAG Coalesce Buffer

slide-21
SLIDE 21

SC’09 21

Last Level Cache DRAM Rest of cache hierarchy

. . .

ECC FIFO . . . Data T1EC

. . . . . . . . . . . . . . . . . .

. . . T2EC encoder TAG T2EC Push to Coalesce Buffer Coalesce Buffer

slide-22
SLIDE 22

SC’09 22

Last Level Cache DRAM Rest of cache hierarchy

. . .

ECC FIFO . . . Data T1EC

. . . . . . . . . . . . . . . . . .

. . . T2EC encoder TAG T2EC Next dirty line comes TAG T2EC Tag/T2EC buffered in Coalesce Buffer Coalesce Buffer

slide-23
SLIDE 23

SC’09 23

Last Level Cache DRAM Rest of cache hierarchy

. . .

ECC FIFO . . . Data T1EC

. . . . . .

. . . T2EC encoder TAG T2EC TAG T2EC Coalesce Buffer is now FULL TAG T2EC TAG T2EC TAG T2EC TAG T2EC

. . . . . . . . . . . .

Coalesce Buffer

slide-24
SLIDE 24

SC’09 24

Last Level Cache DRAM Rest of cache hierarchy

. . .

ECC FIFO . . . Data T1EC

. . . . . .

. . . T2EC encoder Coalesce Buffer TAG T2EC TAG T2EC Write the coalesced T2EC into ECC FIFO TAG T2EC TAG T2EC TAG T2EC TAG T2EC

. . . . . . . . . . . .

T2EC T2EC write size matches to DRAM burst size

slide-25
SLIDE 25

SC’09 25

Last Level Cache DRAM Rest of cache hierarchy

. . .

ECC FIFO . . . Data T1EC

. . . . . .

. . . T2EC encoder Coalesce Buffer

. . . . . . . . . . . .

T2EC Coalesce Buffer becomes empty

slide-26
SLIDE 26

SC’09

More on ECC FIFO

  • Write-back data, but write-through ECC
  • Potential performance degradation

– Increased DRAM traffic due to T2EC writes

  • Error correction

– Search the matching TAG in coalesce buffer AND ECC FIFO

  • May take a long time

– Not a problem since SER is low

  • Sometimes, may not find the matching TAG

– ECC FIFO is finite – Potentially unprotected dirty lines – discussed later

26

slide-27
SLIDE 27

SC’09

EVALUATION

27

slide-28
SLIDE 28

SC’09

Performance Evaluation

  • GEMS + DRAMsim

– An out-of-order SPARC V9 core – Exclusive two-level cache hierarchy – DDR2 667MHz – 5.33GB/s – Eager write-back

  • Clean dirty lines periodically
  • Workloads

– 16 data intensive applications – SPEC CPU 2006, PARSEC, and SPLASH2

28

slide-29
SLIDE 29

SC’09

Performance Penalty

29 0.9 0.95 1 1.05 1.1 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum

  • mnetpp

milc lbm sphinx3 Average SPLASH2 PARSEC SPEC2006 Normalized Execution Time

1.2% 6.0% DRAM – 5.33 GB/s

slide-30
SLIDE 30

SC’09

Performance Penalty

30 0.9 0.95 1 1.05 1.1 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum

  • mnetpp

milc lbm sphinx3 Average SPLASH2 PARSEC SPEC2006 Normalized Execution Time

DRAM – 2.67 GB/s 2.3% 6.3% 6.8%

0.9 0.95 1 1.05 1.1 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum

  • mnetpp

milc lbm sphinx3 Average SPLASH2 PARSEC SPEC2006 Normalized Execution Time

1.2% 6.0% DRAM – 5.33 GB/s

slide-31
SLIDE 31

SC’09

Comparison to MAXn and MME

31

0.9 1 1.1 1.2 1.3 1.4

CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum

  • mnetpp

milc lbm sphinx3 Average SPLASH2 PARSEC SPEC2006

Normalized Execution Time

Max1 Max2 Max4 MME ECC FIFO

1%

slide-32
SLIDE 32

SC’09

Comparison to MAXn and MME

32

0.9 1 1.1 1.2 1.3 1.4

CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum

  • mnetpp

milc lbm sphinx3 Average SPLASH2 PARSEC SPEC2006

Normalized Execution Time

Max1 Max2 Max4 MME ECC FIFO

1%

slide-33
SLIDE 33

SC’09

Comparison to MAXn and MME

33

0.9 1 1.1 1.2 1.3 1.4

CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum

  • mnetpp

milc lbm sphinx3 Average SPLASH2 PARSEC SPEC2006

Normalized Execution Time

Max1 Max2 Max4 MME ECC FIFO

1%

slide-34
SLIDE 34

SC’09

Comparison to MAXn and MME

34

0.9 1 1.1 1.2 1.3 1.4

CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum

  • mnetpp

milc lbm sphinx3 Average SPLASH2 PARSEC SPEC2006

Normalized Execution Time

Max1 Max2 Max4 MME ECC FIFO

4% 1%

slide-35
SLIDE 35

SC’09

Comparison to MAXn and MME

35

0.9 1 1.1 1.2 1.3 1.4

CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum

  • mnetpp

milc lbm sphinx3 Average SPLASH2 PARSEC SPEC2006

Normalized Execution Time

Max1 Max2 Max4 MME ECC FIFO

8% 11% 23% 36% 11% 4% 1%

slide-36
SLIDE 36

SC’09

Comparison to MME

36

1.40E+09 1.60E+09 1.80E+09 2.00E+09 2.20E+09 2.40E+09 2.60E+09 256KB 512KB 1MB 2MB

Execution Time [cycle]

LLC size

Baseline MME ECC FIFO

OCEAN 258x258

10.4%

slide-37
SLIDE 37

SC’09

LLC Area and Power

  • CACTI 5 model of a 45nm process
  • Parity T1EC + SEC-DED T2EC

– 15% area and 9% power saving

  • Compared to SEC-DED baseline
  • SEC-DED T1EC + DEC-TED T2EC

– 15% area and 10% power saving

  • Compared to DEC-TED baseline
  • More error code examples in the paper

37

slide-38
SLIDE 38

SC’09

FIFO OVERWRITE

38

slide-39
SLIDE 39

SC’09

ECC FIFO is Finite

  • ECC FIFO is implemented

as a circular buffer

– Each T2EC push overwrites the oldest entry in the FIFO

  • If the associated data is still valid

even after T2EC is overwritten

– The cache line is no longer protected – Errors on this line cannot be corrected using T2EC

39

slide-40
SLIDE 40

SC’09

40

ECC FIFO LLC t . . .

A small LLC and an ECC FIFO with k entries (no coalesce buffer for the simplicity)

k

slide-41
SLIDE 41

SC’09

41

. . . dirty line eviction to LLC TAG/T2EC to ECC FIFO ECC FIFO LLC t k

slide-42
SLIDE 42

SC’09

42 1 1

1 ECC FIFO LLC t k dirty line eviction to LLC TAG/T2EC to ECC FIFO . . .

slide-43
SLIDE 43

SC’09

. . .

43 1 1 2 2

1 2 ECC FIFO LLC t k dirty line eviction to LLC TAG/T2EC to ECC FIFO

slide-44
SLIDE 44

SC’09

. . .

44 1 1 2 3 4 k-1 k-1 k-2 k-3 k-2 k-3

1 2 … k-3 k-2 k-1 ECC FIFO LLC t k

slide-45
SLIDE 45

SC’09

. . .

45 1 1 2 3 4 k-1 k-1 k-2 k-2 k-3 k 3 k

T2EC 0 is evicted from the FIFO dirty line 0 is no longer protected

ECC FIFO LLC t k dirty line eviction to LLC TAG/T2EC to ECC FIFO 1 2 k-3 k-2 k-1 k …

slide-46
SLIDE 46

SC’09 46

1 2 k-3 k-2 k-1 k t T2EC overwrite-time T2EC valid-time

T2EC valid-time > T2EC overwrite-time The cache line becomes T2EC unprotected

Window of vulnerability

Reuse or evict

slide-47
SLIDE 47

SC’09 47

t T2EC overwrite-time T2EC valid-time

T2EC valid-time < T2EC overwrite-time No window of vulnerability

1 2 k-3 k-2 k-1 k t T2EC overwrite-time T2EC valid-time

T2EC valid-time > T2EC overwrite-time The cache line becomes T2EC unprotected

Window of vulnerability

Reuse or evict Reuse or evict

1 2 k-3 k-2 k-1 k … …

slide-48
SLIDE 48

SC’09

How to Avoid (or Mitigate) It?

  • T2EC overwrite-time

– Make the FIFO bigger

  • The bigger the FIFO, the longer T2EC lifetime
  • Not sure how big it has to be
  • T2EC valid-time

– Inherent memory access pattern

  • A dirty line is re-used or evicted
  • Make the T2EC obsolete before it gets overwritten

– Eager write-back

  • Limit the lifetime of dirty lines

48

slide-49
SLIDE 49

SC’09

Eager Write-Back

  • Scan LLC lines periodically

– Eagerly write-back dirty lines

  • lder than the period
  • It’s cleaning, so it doesn’t affect hit-rate
  • 1M cycle period

– Limit the T2EC valid-time (max: 2xEWB period)

  • More T2ECs become obsolete

before it gets overwritten

  • Eager write-back improves performance

– 6-10% in general, 26% in libquantum

49

slide-50
SLIDE 50

SC’09

1.00E-10 1.00E-09 1.00E-08 1.00E-07 1.00E-06 1.00E-05 1.00E-04 1.00E-03 1.00E-02 1.00E-01 1.00E+00 20 40 60 80 100

Probability of T2EC unprotected FIFO size [thousand entries]

FFT CHOLESKY OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum

  • mnetpp

lbm milc sphinx3

slide-51
SLIDE 51

SC’09

1.00E-10 1.00E-09 1.00E-08 1.00E-07 1.00E-06 1.00E-05 1.00E-04 1.00E-03 1.00E-02 1.00E-01 1.00E+00 20 40 60 80 100

Probability of T2EC unprotected FIFO size [thousand entries]

FFT CHOLESKY OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum

  • mnetpp

lbm milc sphinx3

slide-52
SLIDE 52

SC’09

1.00E-10 1.00E-09 1.00E-08 1.00E-07 1.00E-06 1.00E-05 1.00E-04 1.00E-03 1.00E-02 1.00E-01 1.00E+00 20 40 60 80 100

Probability of T2EC unprotected FIFO size [thousand entries]

FFT CHOLESKY OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum

  • mnetpp

lbm milc sphinx3

slide-53
SLIDE 53

SC’09

Required FIFO size

  • 100k entry FIFO, the worst case

– 1MB of storage in DRAM

  • More in the paper

– A simple analytic model on Probability of unprotected – Required FIFO size with regards to eager write-back period – how to guarantee zero probability of unprotected

53

slide-54
SLIDE 54

SC’09

Conclusion

  • ECC FIFO is an efficient low cost

error protection mechanism

– A simple FIFO off-loads the overhead of strong T2EC

  • 15-25% LLC area reduction
  • 10-17% LLC power saving

– Does not affect LLC caching behavior – Choosing error code for T1EC/T2EC is flexible

  • See more error code examples in the paper
  • Penalties

– Average 1% performance degradation – negligible – Increased error correction latency – but SER is low – Potentially unprotected cache lines

  • Can make this quite low or even guarantee not to occur

54

slide-55
SLIDE 55

SC’09

Flexible Cache Error Protection using an ECC FIFO

Doe Hyun Yoon and Mattan Erez

  • Dept. Electrical and Computer Engineering

The University of Texas at Austin

55