Beehive : Erasure Codes for Fixing Multiple Failures in Distributed - - PowerPoint PPT Presentation

beehive erasure codes for fixing multiple failures in
SMART_READER_LITE
LIVE PREVIEW

Beehive : Erasure Codes for Fixing Multiple Failures in Distributed - - PowerPoint PPT Presentation

Beehive : Erasure Codes for Fixing Multiple Failures in Distributed Storage Systems Jun Li, Baochun Li University of Toronto HotStorage 15 Distributed Storage Store a massive amount of data over a large number of commodity servers,


slide-1
SLIDE 1

Beehive: Erasure Codes for Fixing Multiple Failures in Distributed Storage Systems

Jun Li, Baochun Li University of Toronto HotStorage ’15

slide-2
SLIDE 2

Distributed Storage

  • Store a massive amount of data over a large number of

commodity servers, such as HDFS

  • Servers are subject to frequent failures

2

slide-3
SLIDE 3

Distributed Storage

  • Store redundant data to ensure data durability and

availability regardless of failures

  • replication: store multiple copies on different servers

3

D3 D2 D2 D2 D1 D3 D1 D1 D3

3-way replication

slide-4
SLIDE 4

Distributed Storage

  • Store redundant data to ensure data durability and

availability regardless of failures

  • replication: store multiple copies on different servers

3

D3 D2 D2 D2 D1 D3 D1 D1 D3

storage overhead = 3x

3-way replication

slide-5
SLIDE 5

Erasure Coding

  • Use less storage space to tolerate the same number of failures
  • (k,r) Reed-Solomon (RS) code
  • compute r parity blocks from k data blocks

4

P1 D3 P2 D2 D1

(k=3,r=2) RS code

slide-6
SLIDE 6

Erasure Coding

  • Use less storage space to tolerate the same number of failures
  • (k,r) Reed-Solomon (RS) code
  • compute r parity blocks from k data blocks

4

P1 D3 P2 D2 D1

storage overhead = 1.67x

(k=3,r=2) RS code

slide-7
SLIDE 7

Reed-Solomon Code

  • Achieve the optimal storage overhead to tolerate the same number of failures
  • Typically high cost of reconstruction
  • need to obtain k blocks to reconstruct one

5

P1 D3 P2 D2 D1 P1 P2

(k=3,r=2) RS code

slide-8
SLIDE 8

Reed-Solomon Code

  • Achieve the optimal storage overhead to tolerate the same number of failures
  • Typically high cost of reconstruction
  • need to obtain k blocks to reconstruct one

5

P1 D3 P2 D2 D1 P1 P2

3x disk read and network transfer

(k=3,r=2) RS code

slide-9
SLIDE 9

Network Transfer

  • Minimum-storage regenerating (MSR) codes [Dimakis et al, Trans. IT, 2011]
  • the optimal storage overhead like RS code
  • minimize the network transfer during reconstruction

6

slide-10
SLIDE 10

Network Transfer

  • Minimum-storage regenerating (MSR) codes [Dimakis et al, Trans. IT, 2011]
  • the optimal storage overhead like RS code
  • minimize the network transfer during reconstruction

6

D1 D2 D3 P1 P2 (k=3,r=2) RS D3 P1 P2 D2

128 MB 128 MB 128 MB total transfer = 384 MB 128 MB

slide-11
SLIDE 11

Network Transfer

  • Minimum-storage regenerating (MSR) codes [Dimakis et al, Trans. IT, 2011]
  • the optimal storage overhead like RS code
  • minimize the network transfer during reconstruction

6

D1 D2 D3 P1 P2 (k=3,r=2) RS D3 P1 P2 D2

128 MB 128 MB 128 MB total transfer = 384 MB 128 MB

(k=3,r=2,d=4) MSR

download a small fraction of data from d helpers

D1 D2 D3 P1 P2 D2

64 MB 64 MB 64 MB 64 MB total transfer = 256 MB 128 MB

slide-12
SLIDE 12

Disk I/O

  • MSR codes will incur even more disk I/O than RS codes

since each helper needs to read all its data to compute a small fraction sent out.

7

(k=3,r=2,d=4) MSR D1 D2 D3 P1 P2 D3

64 MB 64 MB 64 MB 64 MB

D3 D3

read compute transfer 64 MB 128 MB

slide-13
SLIDE 13

Can we have erasure codes that save both network transfer and disk I/O during reconstruction?

8

slide-14
SLIDE 14

Multiple Failures

  • Opportunities of fixing multiple failures exists.
  • correlated failures (disk, switch, power)
  • periodical check of failures
  • reconstruct after a certain number of

failures

  • Typically, erasure codes like RS and MSR

codes fix failures separately.

  • Coalesce reconstructions can instantly save

disk I/O

9

(k=3,r=3,d=4) MSR D1 D2 D3 P1 P2 P3

64MB*4 total transfer = 512 MB disk read = 1024 MB storage overhead = 2x

D3 D1

64MB*4

128 MB

slide-15
SLIDE 15

Multiple Failures

9

(k=3,r=3,d=4) MSR D1 D2 D3 P1 P2 P3

64MB*4 total transfer = 512 MB disk read = 1024 MB storage overhead = 2x

D3 D1

64MB*4

128 MB

  • ptimal network transfer

[Shum et al, Trans. IT, 2013]

D1 D2 D3 P1 P2 P3 D3

42.7MB*4 total transfer = 427 MB disk read = 512 MB storage overhead = 2x

D1

42.7MB*4 42.7MB*2

128 MB

slide-16
SLIDE 16

Multiple Failures

9

(k=3,r=3,d=4) MSR D1 D2 D3 P1 P2 P3

64MB*4 total transfer = 512 MB disk read = 1024 MB storage overhead = 2x

D3 D1

64MB*4

128 MB

  • ptimal network transfer

[Shum et al, Trans. IT, 2013]

D1 D2 D3 P1 P2 P3 D3

42.7MB*4 total transfer = 427 MB disk read = 512 MB storage overhead = 2x

D1

42.7MB*4 42.7MB*2

128 MB

code construction exists

  • nly for limited values of

parameters

slide-17
SLIDE 17

Multiple Failures

9

(k=3,r=3,d=4) MSR D1 D2 D3 P1 P2 P3

64MB*4 total transfer = 512 MB disk read = 1024 MB storage overhead = 2x

D3 D1

64MB*4

128 MB

  • ptimal network transfer

[Shum et al, Trans. IT, 2013]

D1 D2 D3 P1 P2 P3 D3

42.7MB*4 total transfer = 427 MB disk read = 512 MB storage overhead = 2x

D1

42.7MB*4 42.7MB*2

128 MB

Beehive D1 D2 D3 P1 P2 P3 D3

42.7MB*4 total transfer = 427 MB disk read = 512 MB storage overhead = 2.25x

D1

42.7MB*4 42.7MB*2

128 MB

code construction exists

  • nly for limited values of

parameters

slide-18
SLIDE 18

Contributions

  • Beehive, a new kind of erasure codes that achieve the
  • ptimal network transfer of coalesced reconstructions
  • with a wide range of system parameters
  • with marginally additional storage overhead
  • C++ implementation to demonstrate the performance

10

slide-19
SLIDE 19

System Parameters

  • k: the minimum number of blocks to decode the original

data

  • r: the maximum number of missing blocks to tolerate

without hurting data durability/availability

  • t: the number of failed blocks to reconstruct
  • d: the number of existing blocks to contact during

reconstruction (d≥2k-1)

11

slide-20
SLIDE 20

Code Construction

slide-21
SLIDE 21

Code Construction

  • Beehive codes are

constructed by combining MSR codes and RS codes.

d-k+1 segments t-1 segments

1 k-1 k

(k,r,d) MSR

1 k-1

(k-1,r+1) RS

k data blocks k-1 data blocks

slide-22
SLIDE 22

Code Construction

  • Beehive codes are

constructed by combining MSR codes and RS codes.

k+1 k+r

ak,1 ak+1,1 ak+r,1 +ak,2 +ak+1,2 +ak+r,2 +…+ak,k-1 +…+ak+1,k-1 +…+ak+r,k-1

1 1 1 2 2 2 k-1 k-1 k-1

r parity blocks r+1 parity blocks

d-k+1 segments t-1 segments

1 k-1 k

(k,r,d) MSR

1 k-1

(k-1,r+1) RS

k data blocks k-1 data blocks

slide-23
SLIDE 23

Code Construction

  • Beehive codes are

constructed by combining MSR codes and RS codes.

k+1 k+r

ak,1 ak+1,1 ak+r,1 +ak,2 +ak+1,2 +ak+r,2 +…+ak,k-1 +…+ak+1,k-1 +…+ak+r,k-1

1 1 1 2 2 2 k-1 k-1 k-1

r parity blocks r+1 parity blocks

d-k+1 segments t-1 segments

1 k-1 k

(k,r,d) MSR

1 k-1

(k-1,r+1) RS

k data blocks k-1 data blocks

block 1 block k-1 block k block k+1 block k+r

slide-24
SLIDE 24

Code Construction

  • Beehive codes are

constructed by combining MSR codes and RS codes.

  • Beehive codes can be

decoded as long as k blocks survive

k+1 k+r

ak,1 ak+1,1 ak+r,1 +ak,2 +ak+1,2 +ak+r,2 +…+ak,k-1 +…+ak+1,k-1 +…+ak+r,k-1

1 1 1 2 2 2 k-1 k-1 k-1

r parity blocks r+1 parity blocks

d-k+1 segments t-1 segments

1 k-1 k

(k,r,d) MSR

1 k-1

(k-1,r+1) RS

k data blocks k-1 data blocks

block 1 block k-1 block k block k+1 block k+r

slide-25
SLIDE 25

Code Construction

  • Beehive codes are

constructed by combining MSR codes and RS codes.

  • Beehive codes can be

decoded as long as k blocks survive

  • With k+r blocks in total,

Beehive codes store t-1 less segments than RS codes and MSR codes

k+1 k+r

ak,1 ak+1,1 ak+r,1 +ak,2 +ak+1,2 +ak+r,2 +…+ak,k-1 +…+ak+1,k-1 +…+ak+r,k-1

1 1 1 2 2 2 k-1 k-1 k-1

r parity blocks r+1 parity blocks

d-k+1 segments t-1 segments

1 k-1 k

(k,r,d) MSR

1 k-1

(k-1,r+1) RS

k data blocks k-1 data blocks

block 1 block k-1 block k block k+1 block k+r

slide-26
SLIDE 26

Code Construction

  • Beehive codes are

constructed by combining MSR codes and RS codes.

  • Beehive codes can be

decoded as long as k blocks survive

  • With k+r blocks in total,

Beehive codes store t-1 less segments than RS codes and MSR codes

  • storage overhead =

k+1 k+r

ak,1 ak+1,1 ak+r,1 +ak,2 +ak+1,2 +ak+r,2 +…+ak,k-1 +…+ak+1,k-1 +…+ak+r,k-1

1 1 1 2 2 2 k-1 k-1 k-1

r parity blocks r+1 parity blocks

d-k+1 segments t-1 segments

1 k-1 k

(k,r,d) MSR

1 k-1

(k-1,r+1) RS

k data blocks k-1 data blocks

block 1 block k-1 block k block k+1 block k+r

k + r k −

t−1 d−k+t

∈ ✓k + r k , k + r k − 1 ◆

slide-27
SLIDE 27

Reconstruction

13

1 2 3 d

1 2 3 d

block i block j

d helpers t newcomers

slide-28
SLIDE 28

Reconstruction

13

1 2 3 d

1 2 3 d

1 1

block i block j

d helpers t newcomers

slide-29
SLIDE 29

Reconstruction

13

1 2 3 d

1 2 3 d

1 1

+

1

block i block j

d helpers t newcomers

slide-30
SLIDE 30

Reconstruction

13

1 2 3 d

1 2 3 d

1 1

+

1

block i block j

1 2 3 d 1 2 3 d

d helpers t newcomers

slide-31
SLIDE 31

Reconstruction

13

1 2 3 d

1 2 3 d

1 1

+

1

block i block j

1 2 3 d 1 2 3 d 1 2 3 d

+ +

i i 1 1 k-1 k-1

d helpers t newcomers

slide-32
SLIDE 32

Reconstruction

13

1 2 3 d

1 2 3 d

1 1

+

1

block i block j

1 2 3 d 1 2 3 d 1 2 3 d

+ +

i i 1 1 k-1 k-1

i

d helpers t newcomers

slide-33
SLIDE 33

Reconstruction

13

1 2 3 d

1 2 3 d

1 1

+

1

block i block j

1 2 3 d 1 2 3 d 1 2 3 d

+ +

i i 1 1 k-1 k-1

i

1 2 3 d

+ +

j j 1 1 k-1 k-1

j

d helpers t newcomers

slide-34
SLIDE 34

Reconstruction

13

1 2 3 d

1 2 3 d

1 1

+

1

block i block j

1 2 3 d 1 2 3 d 1 2 3 d

+ +

i i 1 1 k-1 k-1

i

1 2 3 d

+ +

j j 1 1 k-1 k-1

j

+

j j

+

i i

d helpers t newcomers

slide-35
SLIDE 35

Reconstruction

13

1 2 3 d

1 2 3 d

1 1

+

1

block i block j

1 2 3 d 1 2 3 d 1 2 3 d

+ +

i i 1 1 k-1 k-1

i

1 2 3 d

+ +

j j 1 1 k-1 k-1

j

+

i i

+

j j

+

j j

+

i i

d helpers t newcomers

slide-36
SLIDE 36

Reconstruction

13

1 2 3 d

1 2 3 d

1 1

+

1

block i block j

1 2 3 d 1 2 3 d 1 2 3 d

+ +

i i 1 1 k-1 k-1

i

1 2 3 d

+ +

j j 1 1 k-1 k-1

j

+

i i

+

i i

+

j j

+

j j

+

i i

+

j j

+

j j

+

i i

d helpers t newcomers

slide-37
SLIDE 37

Reconstruction

13

1 2 3 d

1 2 3 d

1 1

+

1

block i block j

1 2 3 d 1 2 3 d 1 2 3 d

+ +

i i 1 1 k-1 k-1

i

1 2 3 d

+ +

j j 1 1 k-1 k-1

j

+

i i

+

i i i i

+

j j

+

j j j j

+

i i

+

j j

+

j j

+

i i

d helpers t newcomers

slide-38
SLIDE 38

Reconstruction

13

1 2 3 d

1 2 3 d

1 1

+

1

block i block j

1 2 3 d 1 2 3 d 1 2 3 d

+ +

i i 1 1 k-1 k-1

i

1 2 3 d

+ +

j j 1 1 k-1 k-1

j

+

i i

+

i i i i

i

+

j j

+

j j j j

j

+

i i

+

j j

+

j j

+

i i

d helpers t newcomers

slide-39
SLIDE 39

Evaluation

  • Implement Beehive in C++, as well as RS and MSR

codes, with Intel storage acceleration library

  • Run performance evaluation on Amazon EC2 (c4.2xlarge)

instances

  • Encode a file of 360 MB (RS & MSR codes) or 350 MB

(Beehive codes), with k = 6, r = 6

  • Compare network transfer and disk I/O

14

slide-40
SLIDE 40

Highlights of Results

  • Network Transfer
  • Beehive can save more traffic than MSR codes (up to 42.9%)
  • Network transfer per newcomer reduces with both d and t
  • Disk I/O
  • Beehive codes save disk read by up to 75%
  • Similar performance throughput of reconstruction
  • RS codes achieve a higher throughput of encoding and decoding due

to its low complexity

15

slide-41
SLIDE 41

Conclusions

  • We present Beehive codes, erasure codes that achieve

the optimal network transfer to reconstruct multiple blocks in batches

  • The construction of Beehive codes can be applied with a

wide range of values of system parameters

  • Implemented in C++, we demonstrate that Beehive can

save both disk I/O and network transfer during reconstruction

16

slide-42
SLIDE 42

Thanks!

17