Overview ECE 753: FAULT-TOLERANT Introduction Motivation and - PDF document

2/27/2014 Overview ECE 753: FAULT-TOLERANT • Introduction • Motivation and Background COMPUTING • Hamming Codes – by example • SEC-DED Codes – Algebraic method Kewal K Saluja Kewal K.Saluja • SEC-DED Codes – Hardware SEC DED C d H d Department of Electrical and Computer • SEC-DED-SBD Codes Engineering • Cyclic Codes – (time permitting) • Summary Low Level Fault-Tolereance: ECC ECE 753 Fault Tolerant Computing 2 Motivation and Background Introduction • References • Memories are integral part of digital • Chapter 3 of Koren and Krishna systems (computers) • Appendix A of the book [siew:92] – also • Majority of chip and/or board area is included in the set of reading material taken by memories t k b i • Following references • Reddy – “A class of linear codes …” IEEETC, • Hence – reliability improvement May 1978 methods must pay attention to • Any book on coding theory memories (RAMs, ROMs, etc.) ECE 753 Fault Tolerant Computing 3 ECE 753 Fault Tolerant Computing 4 Motivation and Background (contd.) Motivation and Background (contd.) • Types of faults prevalent in memories • Theoretical Foundation • During manufacturing – Linear and modern algebra • Concept of groups, fields, and vector spaces – Stuck-at • We will focus on binary codes but will have to include – Timing faults polynomial algebra – Coupling and pattern sensitive faults • Theory – Informal definitions and results y • During operation – Vector: A collection of bits represented as a string – Information bits - collection of k-bits – Cell failures due to life, stress – same as stuck-at – Code word: encoded information bit string – Alpha particle hits – cell content change • k information bits encoded to n bits. Encoded information word • Sensitive to system location. Higher hits at altitudes and in flight is a code word. – Need non-testing based solutions – Check bits: r (= n-k) extra bits used to encode information – Random failures – bit/nibble/byte/card failures bits ECE 753 Fault Tolerant Computing 5 ECE 753 Fault Tolerant Computing 6 1

2/27/2014 Motivation and Background (contd.) Motivation and Background (contd.) • Theory – Informal definitions and results Theory – Informal definitions and results (contd.) – Error detection: Erroneous word (a code word with one or – Hamming weight of a vector v: Number of 1’s in v more bit errors) is not a code word – Hamming distance (HD) between a pair of vectors • Basic results 1: A code is capable of t error detection v 1 and v 2 : number of places two vectors differ from if and only if min HD of the code is at least t+1. each other. – Proof: use sphere packing argument to show this. HD(v 1 , v 2 ) = HW(v 1 ⊕ v 2 ) HD( ) HW( ⊕ ) • Example: Use of parity –we know that we can detect – Code: Collection of code words. single error. – Block code: each code word contains same What is the minimum HD for such a code? number of bits. Prove that the min HD is 2 using the argument – Minimum Hamming distance of a code: Minimum that no two binary strings with even (odd) Hamming of all HDs between all pairs of code words in a weight can have a HD of 1. code. ECE 753 Fault Tolerant Computing 7 ECE 753 Fault Tolerant Computing 8 Motivation and Background (contd.) Hamming Codes – by example Theory – Informal definitions and results • A linear block code (contd.) • Consider a (7,4) Hamming code • Basic results 2: A code is capable of • Let i 1 i 2 i 3 i 4 be information symbols correcting t errors if and only if min HD of the code is at least 2t+1. code is at least 2t+1 • Let p 1 p 2 p 4 be check symbols Let p p p be check symbols – Proof: use sphere packing argument as before. • The parity equations: • Combine the two results: A code is a capable p 1 = i 1 ⊕ i 2 ⊕ i 4 of correcting t errors and detecting d errors (d p 2 = i 1 ⊕ i 3 ⊕ i 4 ≥ t) if and only if min HD of the code is at p 4 = i 2 ⊕ i 3 ⊕ i 4 least t+d+1. ECE 753 Fault Tolerant Computing 9 ECE 753 Fault Tolerant Computing 10 Hamming Codes – by example (contd.) Hamming Codes – by example (contd.) • Properties of the code • Can write the equations as follows (easy to – If there is no error, all parity equations will remember) be satisfied p 1 p 2 i 1 p 4 i 2 i 3 i 4 – Denote the outcomes of these equation 1 0 1 0 1 0 1 1 0 1 0 1 0 1 checks as c 1 , c 2 , c 4 h k 0 1 1 0 0 1 1 – If there is exactly one error, then c 1 , c 2 , c 4 0 0 0 1 1 1 1 point to the error 1 2 3 4 5 6 7 – The vector c 1 , c 2 , c 4 is called syndrome This encodes a 4-bit information word into a 7- – The above (7,4) Hamming code is SEC bit codeword code ECE 753 Fault Tolerant Computing 11 ECE 753 Fault Tolerant Computing 12 2

2/27/2014 Hamming Codes – by example (contd.) Hamming Codes – by example (contd.) • The above method of construction can be Simple bound generalized to construct an (n,k) Hamming When: 2 r = n + 1 the corresponding code Hamming code is a perfect code • Simple bound • Perfect Hamming codes can be k = number of information bits r = number of check bits b f h k bit constructed as follows: t t d f ll n = k + r = total number of bits p 1 p 2 i 1 p 4 i 2 i 3 i 4 p 8 i 5 . . . . . . n + 1 = number of single or fewer errors 2 0 2 1 3 2 2 5 6 7 2 3 9 . . . . . . Each error (including no error) must have a distinct Parity equations can be written as before syndrome With r check bits max possible syndrome = 2 r from the above matrix representation Hence: 2 r ≥ n + 1 ECE 753 Fault Tolerant Computing 13 ECE 753 Fault Tolerant Computing 14 SEC-DED Codes – Algebraic method SEC-DED Codes – Algebraic method (contd.) • Definitions • Definitions (contd.) – (G, *) – An abelian (commutative) Group – (F, +, .) – A Field if • There is a 0 in G (identity) • (F, +) is an abelian group with identity of 0 • For every a in G a -1 is also in G (inverses) • (F - 0, .) is an abelian group • (F 0 ) is an abelian group • For all a and b in a*b = b*a is also in G – Examples (closed) – Examples • (F, ⊕ , .) is a Field • F = (0, 1); ⊕ = Exclusive-OR; . = AND • G = (0, 1); * = ⊕ (Exclusive-OR) • The above Field is called GF(2) • (Z 3 , + 3 ) is a commutative group ECE 753 Fault Tolerant Computing 15 ECE 753 Fault Tolerant Computing 16 SEC-DED Codes – Algebraic method SEC-DED Codes – Algebraic method (contd.) (contd.) • Some results and more definitions • Definitions (contd.) – Over GF(2) a collection of all n-bit vectors – Vector space over a field F forms a vector space • (V, +) is an abelian group • v in V and c in F  cv is V – Let v 1 , v 2 , … , v k be n-bit vectors each. 1 2 k Then all 2 k linear combinations of these k • c(u + v) = cu + cv • (c+d)v = cv + dv vectors form a subspace • C(dv) = (cd)v – A set of k vectors v 1 , v 2 , … , v k is linearly – S ⊆ V is a subspace if S is a vector space independent if for not all c i = 0, i = 1, …, k – A linear combination of vectors is a vector c 1 v 1 + c 2 v 2 + c 3 v 3 + … + c k v k ≠ 0 • u = c 1 v 1 + c 2 v 2 + c 3 v 3 + … + c n v n ECE 753 Fault Tolerant Computing 17 ECE 753 Fault Tolerant Computing 18 3

Overview ECE 753: FAULT-TOLERANT Introduction Motivation and - PDF document

2/27/2014 Overview ECE 753: FAULT-TOLERANT Introduction Motivation and Background COMPUTING Hamming Codes by example SEC-DED Codes Algebraic method Kewal K Saluja Kewal K.Saluja SEC-DED Codes Hardware SEC

Overview Introduction and basic concept ECE 753: FAULT-TOLERANT Fault model and fault

Overview ECE 753: FAULT-TOLERANT Fault Modeling COMPUTING References Introduction

Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing The main principles of fault

Overview Motivation ECE 753: FAULT-TOLERANT About the Course and the Instructor

Overview Introduction ECE 753: FAULT-TOLERANT System Model COMPUTING Diagnosis

Overview ECE 753: FAULT-TOLERANT Introduction - Sources COMPUTING Hardware redundancy

Overview Introduction ECE 753: FAULT-TOLERANT Watchdog techniques COMPUTING

Adaptive Fault Tolerant Systems: Adaptive Fault Tolerant Systems: Reflective Design and

Idealised Fault Tolerant Idealised Fault Tolerant Architectural Element Architectural Element

Distributed Systems 5. Fault Tolerant Systems Fault-Tolerance - 1 Lszl Bszrmnyi

FAULT-TOLERANT CONTROL Is it possible? JAN MACIEJOWSKI Fault- tolerant control. DPS09,

Building a Fault- Building a Fault- Tolerant Distributed Tolerant Distributed System with

Fault-Tolerant Data Collection in Fault-Tolerant Data Collection in Heterogeneous Intelligent

Fault-tolerant techniques Fault-tolerant techniques What causes component faults? What are the

Fault Tolerance and Robustness in Concurrent Systems Faults, errors, failures, and fault

Fault-Tolerant Services in Distributed Systems Usin Vijay K. Garg email: garg@ece.utexas.edu

Multivariate and Partially observed models Erik Lindstrm n T Briefly on multivariate models N

Modeling crowds at mass-events: learning large-scale crowd

B = Y Z Z Z where y z 1 1 y z

in in Fi Finan ance ce 1 KULKUNYA PRAYARACH, PH.D. Modeling Long-Run Relationships in Finance

On Security Enhancement of Lightweight Encryption Employing Error Correction Coding and

Quantum Computation Lecture 27 And that s all we got time for! 1 State 2 State State of

Wrap Up: Cryptographic Primitives Lecture 18 Alternate Assumptions for PKE Randomness

Quantum Error-Correcting Codes: Discrete Math meets Physics Markus Grassl

Sambuz

Useful Links

Newsletter

Mail Us