INRIA Rhône-Alpes – Mathieu Cunche, Vincent Roca
Coding for loss tolerant systems Workshop APRETAF, 22 janvier 2009 - - PowerPoint PPT Presentation
Coding for loss tolerant systems Workshop APRETAF, 22 janvier 2009 - - PowerPoint PPT Presentation
Coding for loss tolerant systems Workshop APRETAF, 22 janvier 2009 Mathieu Cunche, Vincent Roca INRIA, quipe Plante INRIA Rhne-Alpes Mathieu Cunche, Vincent Roca The erasure channel Erasure codes Reed-Solomon codes LDPC
The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage
2
The erasure channel erasure channel
- definition:
≠ BSC (binary symmetric) and AWGN channels…
- the integrity assumption is a strong hypothesis
- a received symbol is 100% guaranteed error free
1 1 Erased ! a symbol either arrives to the destination, without any error… … or is erased and never received
3
The erasure channel where do we find erasure channels?
- On the Internet
- Because of routing error, congestion
- Because of bad CRC/checksum
- On wireless and satelitte networks
- intermittent connection due to obstacles
- Distributed storage
- disk failure in RAID systems
- node failure in a data center
- Distributed computation
- Fail stop
4
The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage
5
Erasure codes
- k sources symbols, encoded into n encoding symbols
- Code rate = =
- Close to 1 => little redundancy
- Close to 0 => high amount of redundancy
Symbol erasure
Encoding Decoding Transmission Source object Decoded object
(n-k) repair symbols k source symbols
k n before encoding after encoding
6
Erasure codes Often used as AL-FEC codes
- “Application Level-Forward Error Correction” codes
AL-FEC differ from Physical-layer FEC codes
- PHY codes:
- correct bit errors, and if not possible detect the errors
- Symbol = bit
- AL-FEC:
- recover from symbol erasures
- Symbol = byte, IP datagram, file chunck
7
Erasure codes how can we define good erasure codes? performance metrics for erasure codes
- erasure recovery capabilities
- main metric, measured as the overhead ratio:
- decoding needs (1+overhead)*k symbols to succeed,
whereas ideal (MDS) codes need only k symbols
- encoding and decoding speed
- to appreciate the complexity
- required memory during encoding and decoding
decoding _overhead #_of _symbols_required _ for _decoding k 1
8
The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage
9
Reed Solomon codes In short
- Discovered by Reed & Solomon in 1959
- Linear codes over GF(2n)
- Sum : simple binary XOR
- Multiplication and Division: use a logarithmic table
- Based on polynomial interpolation
- Practical implementation with Vandermonde matrix
- any k×k submatrix of a Vandermonde is invertible
10
Reed Solomon codes Encoding
- Matrix vector multiplication
- Complexity O(k2) operations
X × G = Y
Source vector: k source symbols Generator matrix: k x n Vandermonde Encoded vector: n encoded symbols
11
× =
Decoding
- Solve a linear system
- Good VDM property: any kxk submatrix is invertible
- k encoding symbols are enough to decode
- Decoding overhead = 0, said differently RS are MDS
- Complexity O(k3)
X × G’ = Y’
Reed Solomon codes
Source vector: k source symbols kxk submatrix of G (invertible) Received vector: k received symbols
12
× =
Reed Solomon codes: summary Perfect codes
- Decoding overhead = 0
- Decoding possible as soon as k symbols are received
… but limited scalability
- n<255 GF(28) is sufficient
- Fast operation over GF(28), (small logarithmic table)
- Decoding speed = a few 10 Mbps
- n>255, use GF(216) or more
- Log table too large, cannot fit in cache
- Decoding speed falls = a few Mbps
13
The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage
14
LDPC codes in short
- “Low Density Parity Check” (LDPC)
- linear block codes
- Sparse parity check matrix
- discovered by Gallager in the 60’s, re-discovered in mid-90s
- In general encoding require to solve a linear system
O(k3)
- but high performance, lightweight variants exist
- in the remaining we focus on a binary LDPC
- Based on XOR operations
15
LDPC codes LDPC-staircase codes (RFC 5170)
- a simple (trivial) parity check matrix structure
- A.K.A. double diagonal or Repeat Accumulate codes
- high encoding speed (encoding is trivial)
- recovery capabilities can be made close to ideal
codes
Source symbols Parity symbols Constraints
1 1 1 0 0 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 S1 S4 S5 P1 P2 = 0
S1 S2 S3 S4 S5 P1 P2 P3 P4 P5
16
Encoding
- Linear complexity O(k)
Decoding
- solve a system of linear equations
- Several techniques are feasible…
P3 =0 P2 =0 P1 =0 P4 =0 P5 =0
LDPC codes
S3S4 S1S4S5P1 S1S2S3P2 S2S4S5P3 S1S2S3S5P4 1 1 1 0 0 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1
S1S4S5P1P2=0 S1 S2 S3 S4 S5 P1 P2 P3 P4 P5
17
LDPC codes Sol.1: Iterative Decoding (ID)
- If an equation has only one unknown variable, this latter
is equal to the sum of the others. Reiterate …
- Efficient thanks to the sparsness of the parity check matrix
- Pros: Low complexity (linear O(k))
- Low CPU load and high sustainable bandwidth
- Cons: Suboptimal in terms of correction capabilities
- Some full rank systems cannot be solved
code rate (k=1000,N1=3) Average overhead Overhead for a failure proba ≤ 10-4 2/3 (=0.66) 9.99 % 13.93 % 2/5 (=0.4) 17.13 % 22.91 % 18
LDPC codes Sol.2: Maximum Likelihood(ML) decoding
- Solve a linear system (Gaussian Elimination, LU
decomposition …)
- Excellent erasure correction capabilities
- High complexity: O(k3)
xA = b
Information of the received symbols Submatrix of the Generator matrix Missing symbols
code rate (k=1000,N1=5) Average overhead Overhead for a failure proba ≤ 10-4 2/3 (=0.66) 0.63 % 2.21 % 2/5 (=0.4) 2.04 % 4.41 % 19
Some more details on LDPC codes considered Sol. 3: Hybrid ID/ML scheme
- Hybrid decoder
- start decoding with ID (fast)
- finish with ML if necessary (optimal)
- excellent erasure correction capabilities…
- … while remaining very fast
20
Decoding speed of the hybrid decoder
- LDPC-staircase (N1=5), code rate 2/3, k=1,000
- Reed Solomon over GF(28)
LDPC codes
ML needed more and more often ID sufficient sustainable decoding speed (Mbps) with RS: 54Mbps 32.4 times faster than RS (1.7 Gbps) still 10.2 times faster (500 Mbps) loss probability(%)
21
The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage
22
Application to distributed storage
1 3 6 7 2 4 5 8 1 3 4 6 1 2 6 8 3 4 6 7 2 3 5 7 2 5 7 8 1 4 5 8
Client_2 Client_1 Using replication :
- A file partitionned into 8 blocks
- Each block is replicated 4 times
Can tolerate up to 3 failures
23
Application to distributed storage
A B C D E F G H I J K L M N O P Q R S T U V W X 1 2 3 4 5 6 7 8
Client_2 Client_1 Using erasure codes:
- A file encoded into 32 blocks:
8 source blocks 24 repair blocks
Can tolerate up to 6 failures, since 8 blocks are enough to decode
24
Conclusion Erasure codes
- Add redundancy to combat symbol erasures
Reed-Solomon
- Perfect codes (MDS), but inefficient for large objects
LDPC codes
- Can encode large objects
- Corrections capabilities close to MDS
- High encoding and decoding speed
25