flexible cache error protection using an ecc fifo
play

Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and - PowerPoint PPT Presentation

Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and Mattan Erez Dept. Electrical and Computer Engineering The University of Texas at Austin 1 SC09 ECC FIFO Goal: to reduce on-chip ECC overhead Two-tiered error


  1. Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and Mattan Erez Dept. Electrical and Computer Engineering The University of Texas at Austin 1 SC’09

  2. ECC FIFO • Goal: to reduce on-chip ECC overhead – Two-tiered error protection • T1EC: light-weight on-chip error code • T2EC: strong error correcting code – Off-load T2EC overhead to FIFO in DRAM • Why FIFO? It’s simple to manage • 15-25 % LLC area reduction • 10-17 % LLC power saving • Just 1 % performance penalty 2 SC’09

  3. BACKGROUND 3 SC’09

  4. Error Correcting Codes • 1-bit parity for error detection • SEC-DED (Hamming) codes – Single-bit Error Correction and Double-bit Error Detection – 8bit ECC for 64bit data • DEC-TED – Double-bit Error Correction and Triple-bit Error Detection – 15bit ECC for 64bit data 4 SC’09

  5. Interleaving • To detect and correct burst errors – N-way interleaving converts an N-bit burst error to N single-bit errors 0 1 2 … N-1 N N+1 N+2 … 2N-1 2N 2N+1 2N+2 … Error code 0 Error code 1 Error code 2 . . . Error code N-1 5 SC’09

  6. Interleaving • To detect and correct burst errors – N-way interleaving converts an N-bit burst error to N single-bit errors 0 1 2 … N-1 N N+1 N+2 … 2N-1 2N 2N+1 2N+2 … Error code 0 Error code 1 Error code 2 . . . . . . Error code N-1 6 SC’09

  7. Interleaving • To detect and correct burst errors – N-way interleaving converts an N-bit burst error to N single-bit errors • Baseline cache error protection – 8 way interleaved SEC-DED • Can correct up to 8-bit burst errors • 8B ECC per 64B cache line 7 SC’09

  8. Uniform Error Protection Tag Data ECC 8 ways ... . . . . . . . . . . . . 2 11 sets . . . .. . 8B 64B ECC increases area AND leakage/dynamic power 8 SC’09

  9. RELATED WORK 9 SC’09

  10. Soft Errors: Observations • Still, Soft Error Rate (SER) is low – Every cache access tries to detect errors, but finds no error in most cases • Error Detection – Common case – Need a low cost, low overhead error detection mechanism • Error Correction – Uncommon case – Correction can be slow – But, still need to maintain error correction info somewhere • Memory hierarchy provides redundancy inherently for clean data – Only dirty lines need error correcting codes 10 SC’09

  11. PERC [Sorin’06] / Energy Efficient [Li’04] Tag Data EDC ECC 8 ways ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 11 sets . . . . . . . ... 1B 7B 64B Read only Data and EDC – saves dynamic power Power gate ECC of clean lines – saves static power 11 SC’09

  12. Area Efficient [Kim’06] Tag Data EDC 4 ways ECC . . . . . . . . . . . . . . . . 2 12 sets 2 12 sets . . . . . 1B 8B 64B Allow only 1 dirty line per set 12 SC’09

  13. MAXn scheme Tag Data EDC 8 ways ... n ways Tag ECC . .. (n<8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 11 2 11 . . . . . . sets sets .. . ... 8B 64B 1B Allow only n dirty lines per set May cause detrimental cleaning traffic 13 SC’09

  14. Two-tiered error protection • Tier-1 Error Code (T1EC) – On-chip light-weight error code – Uniform error protection • Tier-2 Error Code (T2EC) – Strong error codes only for dirty lines – Corrects Detected but Uncorrected Errors (DUE) of T1EC 14 SC’09

  15. Memory Mapped ECC [Yoon’09] Tag Data T1EC T2EC 8 ways ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 11 . . . . . . sets ... ... 1B 8B 64B On-Chip DRAM T2EC is memory mapped AND cached 15 SC’09

  16. ECC FIFO 16 SC’09

  17. ECC FIFO • Use Two-tiered error protection • T2EC is off-loaded to FIFO in DRAM – LLC caching behavior is unaffected • FIFO – Simple to manage • Coalesce buffer – To better utilize DRAM channel 17 SC’09

  18. Rest of cache hierarchy Data T1EC . T2EC . . encoder . . . . . . . . . . . . . . . . . . . . . Coalesce Buffer Last Level Cache DRAM . . . ECC FIFO 18 SC’09

  19. Rest of cache hierarchy Dirty line eviction to LLC Data T1EC . T2EC . . encoder . . . . . . . . . . . . . . . . . . . . . Coalesce Buffer Last Level Cache DRAM . . . ECC FIFO 19 SC’09

  20. Rest of cache hierarchy Encode T2EC and TAG Data T1EC . T2EC . . encoder TAG T2EC . . . . . . . . . . . . . . . . . . . . . Coalesce Buffer Last Level Cache DRAM . . . ECC FIFO 20 SC’09

  21. Rest of cache hierarchy Data T1EC . T2EC . . encoder Push to . . . . Coalesce . . . . Buffer . . . . . . . . . . . . . Coalesce Buffer TAG T2EC Last Level Cache DRAM . . . ECC FIFO 21 SC’09

  22. Rest of cache hierarchy Next dirty line comes Data T1EC . T2EC . Tag/T2EC . encoder buffered in . . . . Coalesce . . . . Buffer . . . . . . . . . . . . . Coalesce TAG T2EC Buffer TAG T2EC Last Level Cache DRAM . . . ECC FIFO 22 SC’09

  23. Rest of cache hierarchy Data T1EC . T2EC . . encoder . . . . Coalesce . . . . Buffer is TAG T2EC . . . . TAG T2EC . . . . now FULL . . . TAG T2EC . TAG T2EC . Coalesce TAG T2EC Buffer TAG T2EC Last Level Cache DRAM . . . ECC FIFO 23 SC’09

  24. Rest of cache hierarchy Data T1EC . T2EC . . encoder . . . . . . . . TAG T2EC . . . . TAG T2EC . . . . . . . TAG T2EC . TAG T2EC . Coalesce TAG T2EC Buffer TAG T2EC T2EC write size Last Level Cache matches to DRAM DRAM burst size Write the coalesced T2EC into ECC FIFO . . . ECC FIFO T2EC 24 SC’09

  25. Rest of cache hierarchy Data T1EC . T2EC . . encoder . . . . . . . . . . . . . . . . . . . . . Coalesce Buffer Last Level Cache Coalesce Buffer becomes empty DRAM . . . ECC FIFO T2EC 25 SC’09

  26. More on ECC FIFO • Write-back data, but write-through ECC • Potential performance degradation – Increased DRAM traffic due to T2EC writes • Error correction – Search the matching TAG in coalesce buffer AND ECC FIFO • May take a long time – Not a problem since SER is low • Sometimes, may not find the matching TAG – ECC FIFO is finite – Potentially unprotected dirty lines – discussed later 26 SC’09

  27. EVALUATION 27 SC’09

  28. Performance Evaluation • GEMS + DRAMsim – An out-of-order SPARC V9 core – Exclusive two-level cache hierarchy – DDR2 667MHz – 5.33GB/s – Eager write-back • Clean dirty lines periodically • Workloads – 16 data intensive applications – SPEC CPU 2006, PARSEC, and SPLASH2 28 SC’09

  29. Normalized Execution Time SC’09 0.95 1.05 0.9 1.1 1 DRAM – 5.33 GB/s CHOLESKY SPLASH2 FFT OCEAN Performance Penalty RADIX canneal PARSEC dedup fluidanimate freqmine 6.0% bzip2 mcf hmmer SPEC2006 libquantum omnetpp milc lbm sphinx3 1.2% Average 29

  30. Normalized Execution Time Normalized Execution Time SC’09 0.95 1.05 0.95 1.05 0.9 1.1 0.9 1.1 1 1 DRAM – 2.67 GB/s DRAM – 5.33 GB/s CHOLESKY CHOLESKY SPLASH2 SPLASH2 FFT FFT OCEAN OCEAN Performance Penalty RADIX RADIX canneal canneal PARSEC PARSEC dedup dedup fluidanimate fluidanimate freqmine freqmine 6.3% 6.0% bzip2 bzip2 mcf mcf hmmer hmmer SPEC2006 SPEC2006 libquantum libquantum omnetpp omnetpp milc milc 6.8% lbm lbm sphinx3 sphinx3 1.2% 2.3% Average Average 30

  31. Normalized Execution Time SC’09 0.9 1.1 1.2 1.3 1.4 1 Comparison to MAXn and MME CHOLESKY SPLASH2 FFT OCEAN RADIX canneal PARSEC dedup fluidanimate freqmine bzip2 mcf hmmer SPEC2006 libquantum omnetpp milc ECC FIFO MME Max4 Max2 Max1 lbm sphinx3 1% Average 31

  32. Normalized Execution Time SC’09 0.9 1.1 1.2 1.3 1.4 1 Comparison to MAXn and MME CHOLESKY SPLASH2 FFT OCEAN RADIX canneal PARSEC dedup fluidanimate freqmine bzip2 mcf hmmer SPEC2006 libquantum omnetpp milc ECC FIFO MME Max4 Max2 Max1 lbm sphinx3 1% Average 32

  33. Normalized Execution Time SC’09 0.9 1.1 1.2 1.3 1.4 1 Comparison to MAXn and MME CHOLESKY SPLASH2 FFT OCEAN RADIX canneal PARSEC dedup fluidanimate freqmine bzip2 mcf hmmer SPEC2006 libquantum omnetpp milc ECC FIFO MME Max4 Max2 Max1 lbm sphinx3 1% Average 33

  34. Normalized Execution Time SC’09 0.9 1.1 1.2 1.3 1.4 1 Comparison to MAXn and MME CHOLESKY SPLASH2 FFT OCEAN RADIX canneal PARSEC dedup fluidanimate freqmine bzip2 mcf hmmer SPEC2006 libquantum omnetpp milc ECC FIFO MME Max4 Max2 Max1 lbm sphinx3 4% 1% Average 34

  35. Normalized Execution Time SC’09 0.9 1.1 1.2 1.3 1.4 1 Comparison to MAXn and MME CHOLESKY SPLASH2 FFT OCEAN RADIX 11% canneal 23% PARSEC dedup fluidanimate freqmine bzip2 mcf hmmer 36% SPEC2006 libquantum 11% omnetpp milc ECC FIFO MME Max4 Max2 Max1 lbm sphinx3 4% 8% 1% Average 35

  36. Comparison to MME 2.60E+09 OCEAN 258x258 Baseline MME 2.40E+09 ECC FIFO Execution Time [cycle] 2.20E+09 2.00E+09 1.80E+09 10.4% 1.60E+09 1.40E+09 256KB 512KB 1MB 2MB 36 SC’09 LLC size

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend