2/27/2014 1
ECE 753: FAULT-TOLERANT COMPUTING
Kewal K Saluja Kewal K.Saluja
Department of Electrical and Computer Engineering
Low Level Fault-Tolereance: ECC
Overview
- Introduction
- Motivation and Background
- Hamming Codes – by example
- SEC-DED Codes – Algebraic method
SEC DED C d H d
ECE 753 Fault Tolerant Computing 2
- SEC-DED Codes – Hardware
- SEC-DED-SBD Codes
- Cyclic Codes – (time permitting)
- Summary
Introduction
- References
- Chapter 3 of Koren and Krishna
- Appendix A of the book [siew:92] – also
included in the set of reading material
ECE 753 Fault Tolerant Computing 3
- Following references
- Reddy – “A class of linear codes …” IEEETC,
May 1978
- Any book on coding theory
Motivation and Background
- Memories are integral part of digital
systems (computers)
- Majority of chip and/or board area is
t k b i
ECE 753 Fault Tolerant Computing 4
taken by memories
- Hence – reliability improvement
methods must pay attention to memories (RAMs, ROMs, etc.)
Motivation and Background (contd.)
- Types of faults prevalent in memories
- During manufacturing
– Stuck-at – Timing faults – Coupling and pattern sensitive faults
ECE 753 Fault Tolerant Computing 5
- During operation
– Cell failures due to life, stress – same as stuck-at – Alpha particle hits – cell content change
- Sensitive to system location. Higher hits at altitudes and in flight
– Need non-testing based solutions – Random failures – bit/nibble/byte/card failures
Motivation and Background (contd.)
- Theoretical Foundation
– Linear and modern algebra
- Concept of groups, fields, and vector spaces
- We will focus on binary codes but will have to include
polynomial algebra
- Theory – Informal definitions and results
ECE 753 Fault Tolerant Computing 6
y
– Vector: A collection of bits represented as a string – Information bits - collection of k-bits – Code word: encoded information bit string
- k information bits encoded to n bits. Encoded information word
is a code word.
– Check bits: r (= n-k) extra bits used to encode information bits