Outline Distributed storage system 1 Regenerating code 2 Our works - - PowerPoint PPT Presentation

outline
SMART_READER_LITE
LIVE PREVIEW

Outline Distributed storage system 1 Regenerating code 2 Our works - - PowerPoint PPT Presentation

MMC, Sep 7,2017 MDS Codes for Distributed Storage System Xi Xiaohu Tang Speaker xh xhutang@sw swjtu.edu.cn cn Email Joint works with J. Li, P. Udaya, and C. Tian Outline Distributed storage system 1 Regenerating code 2 Our works 3 2


slide-1
SLIDE 1

MDS Codes for Distributed Storage

System

Speaker Xi Xiaohu Tang Email xh xhutang@sw swjtu.edu.cn cn

MMC, Sep 7,2017

Joint works with J. Li, P. Udaya, and C. Tian

slide-2
SLIDE 2

Outline

Distributed storage system

1 2 3

Regenerating code

2

Our works

slide-3
SLIDE 3

The age of big data

Every 18 months New storage=Sum of all old storage

Jim Gray 1998 Turing Award Winner

3

slide-4
SLIDE 4

4

Big data

IDC reported the size of the digital universe exceeded § 1 ZB in 2010 § 1.8 Zb in 2011 § 35 Zb expected in 2020

slide-5
SLIDE 5

Challenge

How to store big data?

5

slide-6
SLIDE 6

Solutions: Centralized VS Distributed

Centralized storage Distributed Storage

  • Specific sever
  • Specific disk array
  • Bad scalability
  • Expensive
  • Multiple independent

device

  • Good scalability
  • Cheap

A B

A1 A2 B1 B2 B3 6

node

slide-7
SLIDE 7

Reliability

A B

A1 A2 B1 B3 B2

The node is vulnerable

  • 1. Unavailable temporarily
  • 2. Damaged permanently

Need Redundancy

C

C1 C2 C3 C4 7

slide-8
SLIDE 8

Two mechanisms for redundancy

l Replication

Original data Replication

8

slide-9
SLIDE 9

l Erasure Code

A B A B A+B B A+2B A A+B

(3,2) MDS code, (single parity) used in RAID 5 (4,2) MDS code. Tolerates any 2 failures Used in RAID 6

Two mechanisms for redundancy

Original data

Advantage:

  • Lower redundancy or
  • Higher reliability

9

slide-10
SLIDE 10

Repair

  • Maintain Redundancy

A B

A1 A2 B1 B3 B2

C

B2

repair

C1 C2 C3 C4 C3

repair

10

slide-11
SLIDE 11

Repair

A B A A Original data

Replication VS Erasure Code

B B A B A+B A A

Download B,A+B-> B,A, the whole

  • riginal data

11

Download Bandwidth = Amount of download data for repairing one node

slide-12
SLIDE 12

12

Download bandwidth 2M

Original data Size: 2M

Repair

Erasure code

slide-13
SLIDE 13

Download bandwidth 1.5M

Original data Size: 2M

Repair

Regenerating code

2007 A. G. Dimakis et al.

13

slide-14
SLIDE 14

Storage-Communication tradeoff

Min-Storage Regenerating code Min-Bandwidth Regenerating code

  • A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K.

Ramchandran, “Network coding for distributed storage systems,” IEEE

  • Trans. Inf. Theory, vol. 56, no. 9, pp. 4539–4551, Sep. 2010.

14

slide-15
SLIDE 15

State of the art

15

Rate=The size of original file/The storage

Optimal repair

Rate ≤0.5 >0.5

Systematic node

Completely

Partially

Parity node

Seldom

Before 2014

slide-16
SLIDE 16

K.V. Rashmi, N.B. Shah, and P.V. Kumar, Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction, IEEE Trans. Inf. Theory, Vol. 57, N0. 8, pp. 5227-5239, 2011

Product matrix method

  • MBR for any possible parameters
  • MSR for

(n,k, d≥2k-2)

16

Rate<1/2

slide-17
SLIDE 17

Interference alignment technique

Original data Size: 2M

Repair

Interference alignment: 3 equations but 4 unknowns

17

slide-18
SLIDE 18

General case (n=k+r,k,d)

where fi is a column vector of length α, Ai,j is an square matrix of

  • rder α

18

Node 0 f0 Node k-1 f1 Node k g0=A0,0f0+…+A0,k-1fk-1 Node k+r-1 gr-1=Ar-1,0f0+…+Ar-1,k-1fk-1

… … … …

Systematic Parity

rate: k/(k+r)

slide-19
SLIDE 19

Most interesting Case (n=k+2,k,d=k+1)

  • Data node 1: f1
  • Data node 2: f2
  • Data node k: fk
  • Parity node 1:
  • Parity node 2:

where fi is a column vector of length α, Ai is an square matrix of

  • rder α

19

slide-20
SLIDE 20

Optimal repair

To repair node i, download half from other k+1 nodes by multiplying a matrix Si of order α/2×α

  • Data node 1: Sif1
  • Data node 2: Sif2
  • Data node k: Sifk
  • Parity node 1:
  • Parity node 2:

20

slide-21
SLIDE 21

Sufficient conditions

(k+1)* α/2 equations but k*α unknowns

  • Solve α unknowns fi
  • Cancel (k - 1)α unknowns fj, j ≠ i

21

slide-22
SLIDE 22

Best known results

l T. Tamo, Z. Wang and J. Bruck, ``Zigzag codes: MDS array codes with optimal rebuilding," IEEE Trans. Inform. Theory, vol. 59, no. 3, pp. 1597-1616, Mar. 2013. l Z. Wang, T. Tamo and J. Bruck, ``Long MDS codes for optimal repair bandwidth," Tech. Rep. Available at http :/ /paradise.caltech.edu/etr.html

k α Alphabet size Zigzag m+1 2m 3 Long MDS 3m 2m 2m+1

22

slide-23
SLIDE 23

Zigzag code

23

slide-24
SLIDE 24

Properties

l Optimal access property Directly download without any computation l Optimal update property Update only 2 bits in parity nodes when update one data, which is the minimal update

24

slide-25
SLIDE 25

p Code construction with l Optimal access property l Optimal update property p Optimal repair of parity nodes

Our work

slide-26
SLIDE 26

Code construction

l Includes the Zigzag codes and long MDS codes Establish a general but simple framework of (k+2,k,k+1) MSR code based on invariant subspace technique, which unifies the best known cases l New constructions Construct more MSR codes, some of which improve Zigzag

26

  • J. Li, X.H. Tang, and U. Parampalli, A Framework of Constructions of Minimal

Storage Regenerating Codes With the Optimal Access/Update Property, IEEE Trans.

  • Inf. Theory, 61(4): 1920-1932 (2015)
slide-27
SLIDE 27

Definition: Let q be a prime power and A be a α×α matrix. Assume that U is a subspace of Fq

α with dim(U)=s<α. Then U is said to be a

invariant subspace with respect to A if Definition: Let S be a matrix. Span(S) is defined as the vector space spanned by its rows.

27

Invariant subspace

slide-28
SLIDE 28

Assume that e0 and e1 are two arbitrary linearly independent row vectors of length α over Fq. Let Then Span(S) is an invariant subspace with respect to A if and only if

28

Invariant subspace

slide-29
SLIDE 29

In details, there are 7 cases

  • 1. b=c=0 and a,d ≠0
  • 2. a=d=0 and b,c ≠0
  • 3. b=0 and a,b,c ≠0
  • 4. c=0 and a,b,d ≠0
  • 5. a=0 and b,c,d ≠0
  • 6. d=0 and a,b,c ≠0
  • 7. a,b,c,d ≠0 and ad ≠bc

Equivalent e0 e1

29

Invariant subspace

slide-30
SLIDE 30

30

Invariant subspace

slide-31
SLIDE 31

Let V= Fq

α , V0 and V1 be a partition of V with |V0|=|V1|. For

simplicity, we still use V0 (V1) to denote the matrix formed by the rows of V0 (V1) . Then A can be characterized by Goal: Find k such partitions Vi,0 and Vi,1 to determine the coding matrix Ai

Our methods

31

slide-32
SLIDE 32

Let α=2m, and ei, 0≤i<α be a basis of Fq

α .

The m partitions are

Partition

32

slide-33
SLIDE 33

Our unified construction

33

slide-34
SLIDE 34

Re-interpretation of Zigzag code

34

Type 2

slide-35
SLIDE 35

Re-interpretation of long MDS code

35

Type 1 Type 3

slide-36
SLIDE 36

Construction of new code 1

36

Type 3 Type 2

slide-37
SLIDE 37

Wang et al. Trans-IT 2016 Tamo et al. Trans-IT 2011 Papailiopoulos et al. Trans-IT 2013 Li et al. Trans-IT 2015 Invariant subspace codes:4 classes Many … Long MDS codes (k+r,k) Zigzag (k+r,k) Hadamard Wang et al. Allerton 2011 Li et al. IEEE CL 2016 Papailiopoulos et al. Trans-IT 2013 (k+2,k) Hadamard Tamo et al. Trans-IT 2011 Li et al. Trans-IT 2016 (k+2,k) Zigzag ISIT 2015 Sasidharan et al.’s construction for systematic nodes for all nodes

37

Repair for parity nodes of high-rate code

1. Li, Tang and Tian, Enabling All-Node-Repair in Minimum Storage Regenerating Codes, arXiv:1604.07671, April 2106. (d=n-1) 2. Ye and Barg, Explicit constructions of optimal-access MDS codes with nearly

  • ptimal sub-packetization, arXiv:1605.08630, May 2016. (d≤n-1)

3. Sasidharan, Vajha, and Kumar, An explicit, coupled-layer construction of a high-rate MSR code with low sub-packetization level, small field size and all- node repair, arXiv:1607.07335, July 2016. (d≤n-1)

slide-38
SLIDE 38

Why is this happening? all nodes systematic nodes easy difficult

Barrier

38

slide-39
SLIDE 39

39

Node i0 Node ik-1 Node ik Node ik+r-1

Transformation

Base MDS storage code New MDS storage code ü r nodes: optimal RB ü k nodes: same normalized RB ü same field size ü sub-packetization increased r-fold

Node 0 Node k-1 Node k Node k+r-1

… …

A new transformation

slide-40
SLIDE 40

40

Procedure

Given a base MDS (storage) code l Step 1: Space sharing l Step 2: Permuting l Step 3: Paring

slide-41
SLIDE 41

41

Step 1

Node 0 Node 1 Node 2 Node 3 Node 5 Node 4 Node 6

Stationary nodes variable nodes

Instance

) (

f

) ( 1

f

) ( 2

f

) ( 3

f

) (

g

) ( 1

g

) ( 2

g

Instance 1

) 1 (

f

) 1 ( 1

f

) 1 ( 2

f

) 1 ( 3

f

) 1 (

g

) 1 ( 1

g

) 1 ( 2

g

Instance 2

) 2 (

f

) 2 ( 1

f

) 2 ( 2

f

) 2 ( 3

f

) 2 (

g

) 2 ( 1

g

) 2 ( 2

g

Space sharing r instances to get code C1

Node 0 Node 1 Node 2

stationary nodes

slide-42
SLIDE 42

42

Step 2

Node 5 Node 4 Node 6

variable nodes

Permuting data in variable nodes of C1 to get C2

) (

g

) ( 1

g

) ( 2

g

(1)

g

(1) 1

g

(1) 2

g

(2) 2

g

Instance Instance 1 Instance 2

(1)

g

( ) ( ) i i j j i

g g + →

In some cases, the permutations can be arbitrary.

C1 C2

(2)

g

(2) 1

g

slide-43
SLIDE 43

43

Step 3

Node 5 Node 4 Node 6

variable nodes

Paring data in variable nodes of C2 to get C3

) (

g

) ( 1

g

) ( 2

g

) 1 ( 1

g

) 1 ( 2

g

) 1 (

g

) 2 ( 2

g

) 2 (

g

) 2 ( 1

g

Instance Instance 1 Instance 2

C2

Node 5 Node 4 Node 6

variable nodes

) (

g

) 1 ( 1 ) ( 1

g g +

) 2 ( 2 ) ( 2

g g +

) ( 1 ) 1 ( 1

g g + −

) 1 ( 2

g

) 2 ( ) 1 (

g g +

) ( 2 ) 2 ( 2

g g + −

) 1 ( ) 2 (

g g + −

) 2 ( 1

g

C3

( ) ( )

( )

i j i j i j

g ,g

+ +

slide-44
SLIDE 44

44

The resultant code

Structure of the MDS storage code C3

Instance Instance 1 Instance 2

Node 5 Node 4 Node 6

variable nodes

) (

g

) 1 ( 1 ) ( 1

g g +

) 2 ( 2 ) ( 2

g g +

) ( 1 ) 1 ( 1

g g + −

) 1 ( 2

g

) 2 ( ) 1 (

g g +

) ( 2 ) 2 ( 2

g g + −

) 1 ( ) 2 (

g g + −

) 2 ( 1

g

Node 3 Node 0 Node 1 Node 2

stationary nodes

) (

f

) ( 1

f

2 ( )

f

) ( 3

f

) 1 (

f

) 1 ( 1

f

) 1 ( 2

f

) 1 ( 3

f

) 2 (

f

) 2 ( 1

f

) 2 ( 2

f

) 2 ( 3

f

slide-45
SLIDE 45

45

Optimal repair of variable nodes

Instance Instance 1 Instance 2

Node 5 Node 4 Node 6

variable nodes

) (

g

) 1 ( 1 ) ( 1

g g +

) 2 ( 2 ) ( 2

g g +

) ( 1 ) 1 ( 1

g g + −

) 1 ( 2

g

) 2 ( ) 1 (

g g +

) ( 2 ) 2 ( 2

g g + −

) 1 ( ) 2 (

g g + −

) 2 ( 1

g

Node 3 Node 0 Node 1 Node 2

stationary nodes

) (

f

) ( 1

f

) ( 2

f

) ( 3

f

) 1 (

f

) 1 ( 1

f

) 1 ( 2

f

) 1 ( 3

f

) 2 (

f

) 2 ( 1

f

) 2 ( 2

f

) 2 ( 3

f

√ √ √ √ √

Download column i to repair variable node i

slide-46
SLIDE 46

46

Repair of stationary nodes

Instance 0 Instance 1 Instance 2

Node 5 Node 4 Node 6

) ( 4 , 0 g

S ) (

) 1 ( 1 ) ( 1 5 ,

g g S + ) (

) 2 ( 2 ) ( 2 6 ,

g g S + ) (

) ( 1 ) 1 ( 1 5 ,

g g S + −

) 1 ( 2 6 , 0 g

S ) (

) 2 ( ) 1 ( 4 ,

g g S + ) (

) ( 2 ) 2 ( 2 6 ,

g g S + − ) (

) 1 ( ) 2 ( 4 ,

g g S + −

) 2 ( 1 5 , 0 g

S

Node 3 Node 0 Node 1 Node 2

) (

f

) ( 1 1 , 0 f

S

) ( 2 2 , 0 f

S

) ( 3 3 , 0 f

S

) 1 (

f

) 1 ( 1 1 , 0 f

S

) 1 ( 2 2 , 0 f

S

) 1 ( 3 3 , 0 f

S

) 2 (

f

) 2 ( 1 1 , 0 f

S

) 2 ( 2 2 , 0 f

S

) 2 ( 3 3 , 0 f

S

) ( i

f

0 1 1 0 2 2 0 3 3 0 4 0 5 1 0 6 2 ( i ) ( i ) ( i ) ( i ) ( i ) ( i ) , , , , , ,

S f , S f , S f , S g , S g , S g

Download data

( l ) i,j j

S f

( )

( l ) ( j ) i,k j l j l j l

S ag g

+ + + +

+

slide-47
SLIDE 47

47

Node k Node k-1 Node k-1 Node k Node 0

Node k+r-1 …

Transformatio n

Node 0

Node k+r-1 …

systematic nodes (stationary nodes) parity nodes (variable nodes) MDS storage code with optimal RB for systematic nodes MDS storage code with

  • ptimal RB for all nodes

Base code New code

Only need to concentrate on designing MDS storage codes with optimal repair bandwidth for systematic nodes.

Application I

slide-48
SLIDE 48

48

… … … … … r … … … … … … MDS storage code with optimal repair bandwidth for all nodes MDS code … … … … … Base code 0 New MDS code … … … … … r Base code 1 Base code k/r The base code can even be a scalar code, such as RS codes!

Application II

slide-49
SLIDE 49

49

Remarks

MSR with optimal repair for all nodes

  • 1. Li, Tang and Tian, Enabling All-Node-Repair in Minimum Storage Regenerating

Codes, arXiv:1604.07671, April 2106.

  • 2. Ye and Barg, Explicit constructions of optimal-access MDS codes with nearly
  • ptimal sub-packetization, arXiv:1605.08630, May 2016.
  • 3. Sasidharan, Vajha, and Kumar, An explicit, coupled-layer construction of a

high-rate MSR code with low sub-packetization level, small field size and all-node repair, arXiv:1607.07335, July 2016. MSR from MDS

  • 1. Sasidharan, Vajha, and Kumar, An explicit, coupled-layer construction of a

high-rate MSR code with low sub-packetization level, small field size and all-node repair, arXiv:1607.07335, July 2016.

  • 2. Li , Tang and Tian, A Generic Transformation for Optimal Repair Bandwidth and

Rebuilding Access in MDS Codes," Proc. of the 2017 IEEE Internl. Symp. Inform.

  • Th. , Aachen, Germany, June 2017.
slide-50
SLIDE 50

50

A comparison of some key parameters between the (k+r, k) MSR codes

A comparison with the recent results

slide-51
SLIDE 51

p Proposed a framework of MDS storage code construction

– with optimal repair property for systematic nodes – with optimal access property – with optimal update property

p Proposed a generic transformation of MDS storage code

– from code with optimal repair property for systematic

nodes to code with optimal repair property for all nodes

– from scalar code to code with optimal repair property

for all nodes

51

Conclusions

slide-52
SLIDE 52

52