beehive erasure codes for fixing multiple failures in
play

Beehive : Erasure Codes for Fixing Multiple Failures in Distributed - PowerPoint PPT Presentation

Beehive : Erasure Codes for Fixing Multiple Failures in Distributed Storage Systems Jun Li, Baochun Li University of Toronto HotStorage 15 Distributed Storage Store a massive amount of data over a large number of commodity servers,


  1. Beehive : Erasure Codes for Fixing Multiple Failures in Distributed Storage Systems Jun Li, Baochun Li University of Toronto HotStorage ’15

  2. Distributed Storage ‣ Store a massive amount of data over a large number of commodity servers, such as HDFS ‣ Servers are subject to frequent failures 2

  3. Distributed Storage ‣ Store redundant data to ensure data durability and availability regardless of failures ‣ replication: store multiple copies on different servers D1 D1 D1 D2 D3 D3 D3 D2 D2 3-way replication 3

  4. Distributed Storage ‣ Store redundant data to ensure data durability and availability regardless of failures ‣ replication: store multiple copies on different servers D1 D1 D1 D2 D3 storage overhead = 3x D3 D3 D2 D2 3-way replication 3

  5. Erasure Coding ‣ Use less storage space to tolerate the same number of failures ‣ (k,r) Reed-Solomon (RS) code ‣ compute r parity blocks from k data blocks D1 D2 D3 P1 P2 4 (k=3,r=2) RS code

  6. Erasure Coding ‣ Use less storage space to tolerate the same number of failures ‣ (k,r) Reed-Solomon (RS) code ‣ compute r parity blocks from k data blocks D1 D2 D3 P1 storage overhead = 1.67x P2 4 (k=3,r=2) RS code

  7. Reed-Solomon Code ‣ Achieve the optimal storage overhead to tolerate the same number of failures ‣ Typically high cost of reconstruction ‣ need to obtain k blocks to reconstruct one P1 D1 D2 D3 P1 P2 P2 5 (k=3,r=2) RS code

  8. Reed-Solomon Code ‣ Achieve the optimal storage overhead to tolerate the same number of failures ‣ Typically high cost of reconstruction ‣ need to obtain k blocks to reconstruct one P1 D1 D2 D3 P1 3x disk read and network transfer P2 P2 5 (k=3,r=2) RS code

  9. Network Transfer ‣ Minimum-storage regenerating (MSR) codes [Dimakis et al, Trans. IT, 2011] ‣ the optimal storage overhead like RS code ‣ minimize the network transfer during reconstruction 6

  10. Network Transfer ‣ Minimum-storage regenerating (MSR) codes [Dimakis et al, Trans. IT, 2011] ‣ the optimal storage overhead like RS code ‣ minimize the network transfer during reconstruction (k=3,r=2) RS D1 total transfer = 384 MB D2 128 MB D3 D3 128 MB P1 D2 P1 128 MB P2 P2 6 128 MB

  11. Network Transfer ‣ Minimum-storage regenerating (MSR) codes [Dimakis et al, Trans. IT, 2011] download ‣ the optimal storage overhead like RS code a small fraction of data from d ‣ minimize the network transfer during reconstruction helpers (k=3,r=2,d=4) MSR (k=3,r=2) RS D1 D1 64 MB total transfer = 256 MB total transfer = 384 MB D2 D2 128 MB D3 D3 D3 64 MB D2 128 MB P1 D2 64 MB P1 P1 64 MB 128 MB P2 P2 P2 6 128 MB 128 MB

  12. Disk I/O ‣ MSR codes will incur even more disk I/O than RS codes since each helper needs to read all its data to compute a small fraction sent out. (k=3,r=2,d=4) MSR D1 64 MB D3 D2 D3 64 MB read compute transfer D3 64 MB D3 P1 128 MB 64 MB 64 MB P2 7

  13. Can we have erasure codes that save both network transfer and disk I/O during reconstruction? 8

  14. Multiple Failures ‣ Opportunities of fixing multiple failures exists. D1 D1 ‣ correlated failures (disk, switch, power) D2 D3 ‣ periodical check of failures 64MB*4 P1 ‣ reconstruct after a certain number of 64MB*4 failures P2 D3 ‣ Typically, erasure codes like RS and MSR P3 codes fix failures separately. 128 MB total transfer = 512 MB ‣ Coalesce reconstructions can instantly save disk read = 1024 MB disk I/O storage overhead = 2x (k=3,r=3,d=4) MSR 9

  15. Multiple Failures D1 D1 42.7MB*4 D1 D1 D2 D2 D3 D3 64MB*4 P1 P1 42.7MB*2 64MB*4 P2 P2 D3 D3 P3 P3 42.7MB*4 128 MB 128 MB total transfer = 512 MB total transfer = 427 MB disk read = 1024 MB disk read = 512 MB storage overhead = 2x storage overhead = 2x optimal network transfer (k=3,r=3,d=4) MSR 9 [Shum et al, Trans. IT, 2013]

  16. Multiple Failures D1 D1 42.7MB*4 D1 D1 D2 D2 D3 D3 code construction exists 64MB*4 only for limited values of P1 P1 42.7MB*2 parameters 64MB*4 P2 P2 D3 D3 P3 P3 42.7MB*4 128 MB 128 MB total transfer = 512 MB total transfer = 427 MB disk read = 1024 MB disk read = 512 MB storage overhead = 2x storage overhead = 2x optimal network transfer (k=3,r=3,d=4) MSR 9 [Shum et al, Trans. IT, 2013]

  17. Multiple Failures D1 D1 D1 42.7MB*4 42.7MB*4 D1 D1 D1 D2 D2 D2 D3 D3 D3 code construction exists 64MB*4 only for limited values of P1 P1 P1 42.7MB*2 42.7MB*2 parameters 64MB*4 P2 P2 P2 D3 D3 D3 P3 P3 P3 42.7MB*4 42.7MB*4 128 MB 128 MB 128 MB total transfer = 512 MB total transfer = 427 MB total transfer = 427 MB disk read = 1024 MB disk read = 512 MB disk read = 512 MB storage overhead = 2x storage overhead = 2x storage overhead = 2.25x optimal network transfer Beehive (k=3,r=3,d=4) MSR 9 [Shum et al, Trans. IT, 2013]

  18. Contributions ‣ Beehive, a new kind of erasure codes that achieve the optimal network transfer of coalesced reconstructions ‣ with a wide range of system parameters ‣ with marginally additional storage overhead ‣ C++ implementation to demonstrate the performance 10

  19. System Parameters ‣ k: the minimum number of blocks to decode the original data ‣ r: the maximum number of missing blocks to tolerate without hurting data durability/availability ‣ t: the number of failed blocks to reconstruct ‣ d: the number of existing blocks to contact during reconstruction (d ≥ 2k-1) 11

  20. Code Construction

  21. Code Construction (k,r,d) MSR (k-1,r+1) RS ‣ Beehive codes are constructed by combining MSR codes 1 1 and RS codes. k-1 data k data blocks blocks k-1 k-1 k d-k+1 t-1 segments segments

  22. Code Construction (k,r,d) MSR (k-1,r+1) RS ‣ Beehive codes are constructed by combining MSR codes 1 1 and RS codes. k-1 data k data blocks blocks k-1 k-1 k +a k,2 +…+a k,k-1 a k,1 k-1 2 1 k+1 +a k+1,2 +…+a k+1,k-1 a k+1,1 k-1 2 1 r parity r+1 parity blocks blocks k+r +a k+r,2 +…+a k+r,k-1 a k+r,1 k-1 2 1 d-k+1 t-1 segments segments

  23. Code Construction (k,r,d) MSR (k-1,r+1) RS ‣ Beehive codes are constructed by combining MSR codes 1 block 1 1 and RS codes. k-1 data k data blocks blocks k-1 block k-1 k-1 k block k +a k,2 +…+a k,k-1 a k,1 k-1 2 1 block k+1 k+1 +a k+1,2 +…+a k+1,k-1 a k+1,1 k-1 2 1 r parity r+1 parity blocks blocks k+r block k+r +a k+r,2 +…+a k+r,k-1 a k+r,1 k-1 2 1 d-k+1 t-1 segments segments

  24. Code Construction (k,r,d) MSR (k-1,r+1) RS ‣ Beehive codes are constructed by combining MSR codes 1 block 1 1 and RS codes. k-1 data k data blocks ‣ Beehive codes can be blocks decoded as long as k k-1 block k-1 k-1 blocks survive k block k +a k,2 +…+a k,k-1 a k,1 k-1 2 1 block k+1 k+1 +a k+1,2 +…+a k+1,k-1 a k+1,1 k-1 2 1 r parity r+1 parity blocks blocks k+r block k+r +a k+r,2 +…+a k+r,k-1 a k+r,1 k-1 2 1 d-k+1 t-1 segments segments

  25. Code Construction (k,r,d) MSR (k-1,r+1) RS ‣ Beehive codes are constructed by combining MSR codes 1 block 1 1 and RS codes. k-1 data k data blocks ‣ Beehive codes can be blocks decoded as long as k k-1 block k-1 k-1 blocks survive k block k +a k,2 +…+a k,k-1 a k,1 k-1 2 1 ‣ With k+r blocks in total, Beehive codes store t-1 block k+1 k+1 +a k+1,2 +…+a k+1,k-1 a k+1,1 less segments than RS k-1 2 1 codes and MSR codes r parity r+1 parity blocks blocks k+r block k+r +a k+r,2 +…+a k+r,k-1 a k+r,1 k-1 2 1 d-k+1 t-1 segments segments

  26. Code Construction (k,r,d) MSR (k-1,r+1) RS ‣ Beehive codes are constructed by combining MSR codes 1 block 1 1 and RS codes. k-1 data k data blocks ‣ Beehive codes can be blocks decoded as long as k k-1 block k-1 k-1 blocks survive k block k +a k,2 +…+a k,k-1 a k,1 k-1 2 1 ‣ With k+r blocks in total, Beehive codes store t-1 block k+1 k+1 +a k+1,2 +…+a k+1,k-1 a k+1,1 less segments than RS k-1 2 1 codes and MSR codes r parity r+1 parity blocks blocks ‣ storage overhead = k+r block k+r +a k+r,2 +…+a k+r,k-1 a k+r,1 k-1 2 1 ✓ k + r ◆ k + r , k + r ∈ d-k+1 t-1 t − 1 k k − 1 k − d − k + t segments segments

  27. Reconstruction 1 1 2 2 block i 3 3 d d block j d helpers t newcomers 13

  28. Reconstruction 1 1 1 1 2 2 block i 3 3 d d block j d helpers t newcomers 13

  29. Reconstruction + 1 1 1 1 1 2 2 block i 3 3 d d block j d helpers t newcomers 13

  30. Reconstruction + 1 1 1 1 1 2 1 3 d 2 2 block i 3 3 1 2 3 d d d block j d helpers t newcomers 13

  31. Reconstruction 1 i + 2 1 1 1 i 3 + 1 1 1 + 1 d k-1 2 k-1 1 3 d 2 2 block i 3 3 1 2 3 d d d block j d helpers t newcomers 13

  32. Reconstruction 1 i i + 2 1 1 1 i 3 + 1 1 1 + 1 d k-1 2 k-1 1 3 d 2 2 block i 3 3 1 2 3 d d d block j d helpers t newcomers 13

  33. Reconstruction 1 i i + 2 1 1 1 i 3 + 1 1 1 + 1 d k-1 2 k-1 1 3 d 2 2 block i 3 3 j 1 j 2 1 j 3 2 + 1 1 3 d d + d k-1 k-1 d block j d helpers t newcomers 13

  34. Reconstruction 1 i i + 2 1 1 1 i 3 + 1 1 + j j 1 + 1 d k-1 2 k-1 1 3 d 2 2 block i 3 3 j 1 j 2 1 j 3 2 + 1 1 3 + i i d d + d k-1 k-1 d block j d helpers t newcomers 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend