 
              MMC, Sep 7,2017 MDS Codes for Distributed Storage System Xi Xiaohu Tang Speaker xh xhutang@sw swjtu.edu.cn cn Email Joint works with J. Li, P. Udaya, and C. Tian
Outline Distributed storage system 1 Regenerating code 2 Our works 3 2
The age of big data Every 18 months New storage=Sum of all old storage Jim Gray 1998 Turing Award 3 Winner
Big data IDC reported the size of the digital universe exceeded § 1 ZB in 2010 § 1.8 Zb in 2011 § 35 Zb expected in 2020 4
Challenge How to store big data? 5
Solutions: Centralized VS Distributed Centralized storage Distributed Storage Specific sever • • Multiple independent Specific disk array device • Bad scalability Good scalability • • Expensive Cheap • • A1 A A2 node B B1 B2 6 B3
Reliability A1 A The node is vulnerable 1. Unavailable temporarily A2 2. Damaged permanently B B1 B2 B3 C C1 C2 Need C3 Redundancy C4 7
Two mechanisms for redundancy l Replication Original data Replication 8
Two mechanisms for redundancy l Erasure Code A A Original data B B A B A+B A+B (3,2) MDS code, A+2B (single parity) used in RAID 5 Advantage: (4,2) MDS code. Tolerates any 2 • Lower redundancy or failures • Higher reliability Used in RAID 6 9
Repair • Maintain Redundancy A1 A A2 B B1 repair B2 B2 B3 C1 C C2 repair C3 10 C4 C3
Repair Replication VS Erasure Code A A A A Original data B A A Download B,A+B-> B,A, B A+B B the whole original data B Download Bandwidth = Amount of download data for repairing one node 11
Erasure code Original data Size: Repair 2M Download bandwidth 2M 12
Regenerating code 2007 A. G. Dimakis et al. Original data Repair Size: 2M Download bandwidth 1.5M 13
Storage-Communication tradeoff Min-Bandwidth Regenerating code Min-Storage Regenerating code A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,” IEEE 14 Trans. Inf. Theory, vol. 56, no. 9, pp. 4539–4551, Sep. 2010.
State of the art Before 2014 Rate Optimal repair ≤0.5 >0.5 Systematic node Partially Completely Parity node Seldom Rate=The size of original file/The storage 15
Product matrix method • MBR for any possible parameters • MSR for ( n , k , d ≥ 2 k - 2) Rate<1/2 K.V. Rashmi, N.B. Shah, and P.V. Kumar, Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction, IEEE Trans. Inf. Theory, Vol. 57, N0. 8, pp. 5227-5239, 2011 16
Interference alignment technique Original data Repair Size: 2M Interference alignment: 3 equations but 4 unknowns 17
General case (n=k+r,k,d) Node 0 f 0 … … Systematic Node k -1 f 1 Node k g 0 =A 0,0 f 0 + … +A 0,k-1 f k-1 … … Parity Node k + r -1 g r-1 =A r-1,0 f 0 + … +A r-1,k-1 f k-1 where f i is a column vector of length α , A i,j is an square matrix of order α rate: k /( k + r ) 18
Most interesting Case (n=k+2,k,d=k+1) • Data node 1: f 1 • Data node 2: f 2 • … • Data node k: f k • Parity node 1: • Parity node 2: where f i is a column vector of length α , A i is an square matrix of order α 19
Optimal repair To repair node i , download half from other k +1 nodes by multiplying a matrix S i of order α /2 × α • Data node 1: S i f 1 • Data node 2: S i f 2 • … • Data node k: S i f k • Parity node 1: • Parity node 2: 20
Sufficient conditions ( k +1)* α /2 equations but k * α unknowns • Solve α unknowns f i • Cancel ( k - 1) α unknowns f j , j ≠ i 21
Best known results Alphabet k α size Zigzag m +1 2 m 3 Long MDS 3 m 2 m 2m+1 l T. Tamo, Z. Wang and J. Bruck, ``Zigzag codes: MDS array codes with optimal rebuilding," IEEE Trans. Inform. Theory, vol. 59, no. 3, pp. 1597-1616, Mar. 2013. l Z. Wang, T. Tamo and J. Bruck, ``Long MDS codes for optimal repair bandwidth," Tech. Rep. Available at http :/ /paradise.caltech.edu/etr.html 22
Zigzag code 23
Properties l Optimal access property Directly download without any computation l Optimal update property Update only 2 bits in parity nodes when update one data, which is the minimal update 24
Our work p Code construction with l Optimal access property l Optimal update property p Optimal repair of parity nodes
Code construction l Includes the Zigzag codes and long MDS codes Establish a general but simple framework of ( k +2, k , k +1) MSR code based on invariant subspace technique, which unifies the best known cases l New constructions Construct more MSR codes, some of which improve Zigzag J. Li, X.H. Tang, and U. Parampalli, A Framework of Constructions of Minimal Storage Regenerating Codes With the Optimal Access/Update Property, IEEE Trans. 26 Inf. Theory, 61(4): 1920-1932 (2015)
Invariant subspace Definition: Let q be a prime power and A be a α × α matrix. Assume that U is a subspace of F q α with dim( U )= s < α . Then U is said to be a invariant subspace with respect to A if Definition: Let S be a matrix. Span(S) is defined as the vector space spanned by its rows. 27
Invariant subspace Assume that e 0 and e 1 are two arbitrary linearly independent row vectors of length α over F q . Let Then Span( S ) is an invariant subspace with respect to A if and only if 28
Invariant subspace In details, there are 7 cases 1. b = c =0 and a , d ≠ 0 2. a = d =0 and b , c ≠ 0 3. b =0 and a , b , c ≠ 0 Equivalent 4. c=0 and a, b , d ≠ 0 e 0 e 1 5. a=0 and b , c , d ≠ 0 6. d=0 and a , b , c ≠ 0 7. a , b , c , d ≠ 0 and ad ≠ bc 29
Invariant subspace 30
Our methods Let V = F q α , V 0 and V 1 be a partition of V with | V 0 |=| V 1 |. For simplicity, we still use V 0 ( V 1 ) to denote the matrix formed by the rows of V 0 ( V 1 ) . Then A can be characterized by Goal: Find k such partitions V i ,0 and V i ,1 to determine the coding matrix A i 31
Partition Let α =2 m , and e i , 0 ≤ i < α be a basis of F q α . The m partitions are 32
Our unified construction 33
Re-interpretation of Zigzag code Type 2 34
Re-interpretation of long MDS code Type 3 Type 1 35
Construction of new code 1 Type 2 Type 3 36
Repair for parity nodes of high-rate code Wang et al. Allerton 2011 for systematic nodes Li et al. IEEE CL 2016 Sasidharan et al.’s construction ( k + r , k ) Hadamard ( k +2, k ) Hadamard Papailiopoulos et al. ISIT 2015 ( k + r , k ) Zigzag Papailiopoulos et al. Trans-IT 2013 Trans-IT 2013 ( k +2, k ) Zigzag Tamo et al. Trans-IT 2011 Tamo et al. Trans-IT 2011 Invariant subspace Li et al. Trans-IT 2016 codes : 4 classes Many … Long MDS for all nodes Li et al. Trans-IT 2015 codes Wang et al. Trans-IT 2016 1. Li, Tang and Tian, Enabling All-Node-Repair in Minimum Storage Regenerating Codes, arXiv:1604.07671, April 2106. ( d = n -1) 2. Ye and Barg, Explicit constructions of optimal-access MDS codes with nearly optimal sub-packetization, arXiv:1605.08630, May 2016. ( d ≤ n -1) 3. Sasidharan, Vajha, and Kumar, An explicit, coupled-layer construction of a high-rate MSR code with low sub-packetization level, small field size and all- node repair, arXiv:1607.07335, July 2016. ( d ≤ n -1) 37
Barrier all nodes difficult systematic nodes easy Why is this happening? 38
A new transformation Base MDS storage New MDS storage code code Node 0 Node i 0 ü r nodes: optimal … … RB Node k -1 Node i k -1 ü k nodes: same Transformation normalized RB Node k Node i k ü same field size … … ü sub-packetization Node k+r -1 Node i k+r -1 increased r -fold 39
Procedure Given a base MDS (storage) code l Step 1: Space sharing l Step 2: Permuting l Step 3: Paring 40
Step 1 Space sharing r instances to get code C 1 Instance Instance Instance 0 1 2 ( 0 ) ( 1 ) ( 2 ) f f f Node 0 Node 0 0 0 0 ( 1 ) ( 0 ) ( 2 ) f f f Node 1 Node 1 stationary Stationary 1 1 1 nodes nodes ( 1 ) ( 0 ) ( 2 ) f f f Node 2 Node 2 2 2 2 ( 1 ) ( 0 ) ( 2 ) Node 3 f f f 3 3 3 ( 1 ) ( 0 ) ( 2 ) g g g Node 4 0 0 0 variable ( 1 ) ( 0 ) ( 2 ) g g g Node 5 1 1 1 nodes ( 1 ) ( 0 ) ( 2 ) g g g Node 6 2 2 2 41
Step 2 Permuting data in variable nodes of C 1 to get C 2 ( ) i ( ) i g g + → Instance Instance Instance j j i 0 1 2 (1) ( 0 ) g (2) g g Node 4 0 0 0 variable C 2 (1) (2) C 1 ( 0 ) g g g Node 5 1 1 1 nodes (1) ( 0 ) (2) g g Node 6 g 2 2 2 (1) g 0 In some cases, the permutations can be arbitrary. 42
Step 3 Paring data in variable nodes of C 2 to get C 3 ( ) i ( ) j ( g ,g ) Instance Instance Instance i j i j 0 1 2 + + ( 0 ) ( 1 ) ( 2 ) g g g Node 4 0 1 2 variable ( 2 ) ( 0 ) ( 1 ) g g g C 2 Node 5 2 0 1 nodes ( 1 ) ( 0 ) ( 2 ) g Node 6 g g 0 2 1 ( 0 ) ( 2 ) ( 0 ) g ( 1 ) ( 0 ) g g g g − + − + Node 4 0 2 2 1 1 variable ( 2 ) ( 1 ) C 3 ( 0 ) ( 1 ) ( 1 ) g g g g g − + + Node 5 0 0 2 1 1 nodes ( 1 ) ( 2 ) ( 0 ) ( 2 ) g g ( 2 ) Node 6 g g g + + 0 0 2 2 1 43
Recommend
More recommend