overview ece 753 fault tolerant
play

Overview ECE 753: FAULT-TOLERANT Introduction Motivation and - PDF document

2/27/2014 Overview ECE 753: FAULT-TOLERANT Introduction Motivation and Background COMPUTING Hamming Codes by example SEC-DED Codes Algebraic method Kewal K Saluja Kewal K.Saluja SEC-DED Codes Hardware SEC


  1. 2/27/2014 Overview ECE 753: FAULT-TOLERANT • Introduction • Motivation and Background COMPUTING • Hamming Codes – by example • SEC-DED Codes – Algebraic method Kewal K Saluja Kewal K.Saluja • SEC-DED Codes – Hardware SEC DED C d H d Department of Electrical and Computer • SEC-DED-SBD Codes Engineering • Cyclic Codes – (time permitting) • Summary Low Level Fault-Tolereance: ECC ECE 753 Fault Tolerant Computing 2 Motivation and Background Introduction • References • Memories are integral part of digital • Chapter 3 of Koren and Krishna systems (computers) • Appendix A of the book [siew:92] – also • Majority of chip and/or board area is included in the set of reading material taken by memories t k b i • Following references • Reddy – “A class of linear codes …” IEEETC, • Hence – reliability improvement May 1978 methods must pay attention to • Any book on coding theory memories (RAMs, ROMs, etc.) ECE 753 Fault Tolerant Computing 3 ECE 753 Fault Tolerant Computing 4 Motivation and Background (contd.) Motivation and Background (contd.) • Types of faults prevalent in memories • Theoretical Foundation • During manufacturing – Linear and modern algebra • Concept of groups, fields, and vector spaces – Stuck-at • We will focus on binary codes but will have to include – Timing faults polynomial algebra – Coupling and pattern sensitive faults • Theory – Informal definitions and results y • During operation – Vector: A collection of bits represented as a string – Information bits - collection of k-bits – Cell failures due to life, stress – same as stuck-at – Code word: encoded information bit string – Alpha particle hits – cell content change • k information bits encoded to n bits. Encoded information word • Sensitive to system location. Higher hits at altitudes and in flight is a code word. – Need non-testing based solutions – Check bits: r (= n-k) extra bits used to encode information – Random failures – bit/nibble/byte/card failures bits ECE 753 Fault Tolerant Computing 5 ECE 753 Fault Tolerant Computing 6 1

  2. 2/27/2014 Motivation and Background (contd.) Motivation and Background (contd.) • Theory – Informal definitions and results Theory – Informal definitions and results (contd.) – Error detection: Erroneous word (a code word with one or – Hamming weight of a vector v: Number of 1’s in v more bit errors) is not a code word – Hamming distance (HD) between a pair of vectors • Basic results 1: A code is capable of t error detection v 1 and v 2 : number of places two vectors differ from if and only if min HD of the code is at least t+1. each other. – Proof: use sphere packing argument to show this. HD(v 1 , v 2 ) = HW(v 1 ⊕ v 2 ) HD( ) HW( ⊕ ) • Example: Use of parity –we know that we can detect – Code: Collection of code words. single error. – Block code: each code word contains same What is the minimum HD for such a code? number of bits. Prove that the min HD is 2 using the argument – Minimum Hamming distance of a code: Minimum that no two binary strings with even (odd) Hamming of all HDs between all pairs of code words in a weight can have a HD of 1. code. ECE 753 Fault Tolerant Computing 7 ECE 753 Fault Tolerant Computing 8 Motivation and Background (contd.) Hamming Codes – by example Theory – Informal definitions and results • A linear block code (contd.) • Consider a (7,4) Hamming code • Basic results 2: A code is capable of • Let i 1 i 2 i 3 i 4 be information symbols correcting t errors if and only if min HD of the code is at least 2t+1. code is at least 2t+1 • Let p 1 p 2 p 4 be check symbols Let p p p be check symbols – Proof: use sphere packing argument as before. • The parity equations: • Combine the two results: A code is a capable p 1 = i 1 ⊕ i 2 ⊕ i 4 of correcting t errors and detecting d errors (d p 2 = i 1 ⊕ i 3 ⊕ i 4 ≥ t) if and only if min HD of the code is at p 4 = i 2 ⊕ i 3 ⊕ i 4 least t+d+1. ECE 753 Fault Tolerant Computing 9 ECE 753 Fault Tolerant Computing 10 Hamming Codes – by example (contd.) Hamming Codes – by example (contd.) • Properties of the code • Can write the equations as follows (easy to – If there is no error, all parity equations will remember) be satisfied p 1 p 2 i 1 p 4 i 2 i 3 i 4 – Denote the outcomes of these equation 1 0 1 0 1 0 1 1 0 1 0 1 0 1 checks as c 1 , c 2 , c 4 h k 0 1 1 0 0 1 1 – If there is exactly one error, then c 1 , c 2 , c 4 0 0 0 1 1 1 1 point to the error 1 2 3 4 5 6 7 – The vector c 1 , c 2 , c 4 is called syndrome This encodes a 4-bit information word into a 7- – The above (7,4) Hamming code is SEC bit codeword code ECE 753 Fault Tolerant Computing 11 ECE 753 Fault Tolerant Computing 12 2

  3. 2/27/2014 Hamming Codes – by example (contd.) Hamming Codes – by example (contd.) • The above method of construction can be Simple bound generalized to construct an (n,k) Hamming When: 2 r = n + 1 the corresponding code Hamming code is a perfect code • Simple bound • Perfect Hamming codes can be k = number of information bits r = number of check bits b f h k bit constructed as follows: t t d f ll n = k + r = total number of bits p 1 p 2 i 1 p 4 i 2 i 3 i 4 p 8 i 5 . . . . . . n + 1 = number of single or fewer errors 2 0 2 1 3 2 2 5 6 7 2 3 9 . . . . . . Each error (including no error) must have a distinct Parity equations can be written as before syndrome With r check bits max possible syndrome = 2 r from the above matrix representation Hence: 2 r ≥ n + 1 ECE 753 Fault Tolerant Computing 13 ECE 753 Fault Tolerant Computing 14 SEC-DED Codes – Algebraic method SEC-DED Codes – Algebraic method (contd.) • Definitions • Definitions (contd.) – (G, *) – An abelian (commutative) Group – (F, +, .) – A Field if • There is a 0 in G (identity) • (F, +) is an abelian group with identity of 0 • For every a in G a -1 is also in G (inverses) • (F - 0, .) is an abelian group • (F 0 ) is an abelian group • For all a and b in a*b = b*a is also in G – Examples (closed) – Examples • (F, ⊕ , .) is a Field • F = (0, 1); ⊕ = Exclusive-OR; . = AND • G = (0, 1); * = ⊕ (Exclusive-OR) • The above Field is called GF(2) • (Z 3 , + 3 ) is a commutative group ECE 753 Fault Tolerant Computing 15 ECE 753 Fault Tolerant Computing 16 SEC-DED Codes – Algebraic method SEC-DED Codes – Algebraic method (contd.) (contd.) • Some results and more definitions • Definitions (contd.) – Over GF(2) a collection of all n-bit vectors – Vector space over a field F forms a vector space • (V, +) is an abelian group • v in V and c in F  cv is V – Let v 1 , v 2 , … , v k be n-bit vectors each. 1 2 k Then all 2 k linear combinations of these k • c(u + v) = cu + cv • (c+d)v = cv + dv vectors form a subspace • C(dv) = (cd)v – A set of k vectors v 1 , v 2 , … , v k is linearly – S ⊆ V is a subspace if S is a vector space independent if for not all c i = 0, i = 1, …, k – A linear combination of vectors is a vector c 1 v 1 + c 2 v 2 + c 3 v 3 + … + c k v k ≠ 0 • u = c 1 v 1 + c 2 v 2 + c 3 v 3 + … + c n v n ECE 753 Fault Tolerant Computing 17 ECE 753 Fault Tolerant Computing 18 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend