Rethinking Erasure Codes for Cloud File Systems: Minimizing I/O for - - PowerPoint PPT Presentation

rethinking erasure codes for cloud file systems
SMART_READER_LITE
LIVE PREVIEW

Rethinking Erasure Codes for Cloud File Systems: Minimizing I/O for - - PowerPoint PPT Presentation

Rethinking Erasure Codes for Cloud File Systems: Minimizing I/O for Recovery and Degraded Reads USENIX FAST 2012 Osama Khan and Randal Burns, Johns Hopkins University James Plank and William Pierce, University of Tennessee 1 Cheng Huang,


slide-1
SLIDE 1

Rethinking Erasure Codes for Cloud File Systems: Minimizing I/O for Recovery and Degraded Reads

Osama Khan and Randal Burns, Johns Hopkins University James Plank and William Pierce, University of Tennessee Cheng Huang, Microsoft Research

USENIX FAST 2012

1

slide-2
SLIDE 2

What is the problem?

  • Data Explosion
  • Much of that data will be stored in the cloud
  • Replication too expensive  Erasure coding to the rescue
  • As pointed out previously [Zhang ’10 and others]

USENIX FAST 2012

2

slide-3
SLIDE 3

What is the problem?

  • Humongous scale + failure rates = Frequent recovery needed
  • Also, rolling software updates result in downtime [Brewer ‘01]
  • Two operations become prominent:
  • Disk reconstruction
  • Degraded reads
  • Existing erasure codes are not designed with recovery I/O
  • ptimization in mind
  • Need to optimize existing codes for these operations
  • Need new codes which are intrinsically designed for these
  • perations

USENIX FAST 2012

3

slide-4
SLIDE 4

Minimizing Recovery I/O

  • Algorithm minimizes the amount of data needed for recovery
  • Applicable to any XOR based erasure code
  • Existing erasure codes and configurations are not suitable for

the cloud

  • Large file system blocks required to extract good recovery

performance

  • Rotated Reed-Solomon Codes
  • A new class of Reed-Solomon Codes which optimize degraded

read performance

  • Better choice than standard Reed-Solomon codes for the cloud

USENIX FAST 2012

4

slide-5
SLIDE 5

Outline

  • Erasure Coded Storage Systems
  • Algorithm for minimizing number of symbols
  • Rotated Reed-Solomon Codes
  • Analysis & Evaluation
  • Conclusions

USENIX FAST 2012

5

slide-6
SLIDE 6

Erasure Coded Storage Systems

USENIX FAST 2012

6 Wait until block is full  Sealed  Erasure coded  Distributed to nodes

slide-7
SLIDE 7

Erasure Coded Storage Systems

USENIX FAST 2012

7 k = 6 m = 3 r = 4

slide-8
SLIDE 8

Outline

  • Erasure Coded Storage Systems
  • Algorithm for minimizing number of symbols
  • Rotated Reed-Solomon Codes
  • Analysis & Evaluation
  • Conclusions

USENIX FAST 2012

8

slide-9
SLIDE 9

Decoding Equations

USENIX FAST 2012

9 0 0 0 1 0 1 0 0 1 0 1 {R0, R2, R4} is a decoding equation And it can be represented by 10101000 1 1 1 0 0 0 

slide-10
SLIDE 10

Algorithm to minimize recovery I/O

  • Finds a decoding equation for each failed bit while minimizing

the number of total symbols accessed

  • Makes use of data sharing [Xiang ‘10]
  • Given a code generator matrix and a list of failed symbols, the

algorithm outputs decoding equations to recover each failed symbol

USENIX FAST 2012

10

slide-11
SLIDE 11

Algorithm Details

  • Enumerate all valid decoding equations for each failed symbol
  • Directed graph formulation of problem makes it convenient to

solve

  • Nodes are bit strings
  • Edges denote equations
  • Child’s bit string = parent’s bit string OR’ed with equation

corresponding to incoming edge

USENIX FAST 2012

11 11000100 11001101 weight = 2 ei,j = 01001001 An edge for each equation in Ei Cumulative record of symbols needed for recovery Parent node Child node

slide-12
SLIDE 12

Example

USENIX FAST 2012

12 Recovery

  • ptions for R0

Recovery

  • ptions for R1
slide-13
SLIDE 13

Example - Graph

USENIX FAST 2012

13 Level 0: Equations from E0 Level 1: Equations from E1 Grayed out nodes/edges denote pruning Starting node

slide-14
SLIDE 14

Algorithm Summary

  • Minimizes the number of symbols needed to recover from an

arbitrary number of failures

  • Solutions to all common failure combinations may be computed
  • ffline a priori and stored for future use
  • Works for any XOR-based code
  • Generalizes previous results (EVEN/ODD[Wang ‘10], RDP[Xiang ‘10])
  • Other codes turned out to perform better than EVEN/ODD and RDP

USENIX FAST 2012

14

slide-15
SLIDE 15

Outline

  • Erasure Coded Storage Systems
  • Algorithm for minimizing number of symbols
  • Rotated Reed-Solomon Codes
  • Analysis & Evaluation
  • Conclusions

USENIX FAST 2012

15

slide-16
SLIDE 16

Rotated Reed-Solomon Codes

  • Vast majority of failure scenarios are single disk failures (99.75%

[Schroeder ‘07])

  • 90% of failures are transient and do not involve data loss [Ford ‘10]
  • Google waits 15 minutes before reconstructing disk
  • Degraded read to missing data requires recovery using erasure code
  • New class of codes optimize degraded read performance in case of

single disk failure

  • MDS (for certain values of k, m and r)
  • Modification to standard Reed-Solomon codes

USENIX FAST 2012

16

slide-17
SLIDE 17

Standard Reed-Solomon Codes

  • A sample Reed-Solomon code
  • Coding symbols can be calculated by

USENIX FAST 2012

17 k = 6 m = 3 r = 1

slide-18
SLIDE 18

Rotated Reed-Solomon Codes

  • Coding symbols calculated by

USENIX FAST 2012

19 k = 6 m = 3 r = 3

slide-19
SLIDE 19

Reconstruction example with Rotated RS Codes

USENIX FAST 2012

20 Rotated Reed-Solomon P-Drive 16 symbols read 24 symbols read Disk 0 fails Data symbol read Data symbol not read Coding symbol read Coding symbol not read

slide-20
SLIDE 20

Degraded Read example with Rotated RS Codes

  • Read request of 4 symbols starting from d5,0
  • Penalty = # of symbols read in addition to read request

USENIX FAST 2012

21 1 2 3 4 5 1 2 Data Disks Coding Disks Rotated Reed-Solomon P-Drive Penalty = 2 symbols Penalty = 5 symbols Disk 5 fails Data symbol read Data symbol not read Coding symbol read Coding symbol not read

slide-21
SLIDE 21

Outline

  • Erasure Coded Storage Systems
  • Algorithm for minimizing number of symbols
  • Rotated Reed-Solomon Codes
  • Analysis & Evaluation
  • Conclusions

USENIX FAST 2012

22

slide-22
SLIDE 22

Analysis of Reconstruction

USENIX FAST 2012

23

slide-23
SLIDE 23

Analysis of Degraded Reads

USENIX FAST 2012

24

slide-24
SLIDE 24

Evaluation of Disk Reconstruction (m = 2)

USENIX FAST 2012

25

slide-25
SLIDE 25

Evaluation of Disk Reconstruction (m = 3)

USENIX FAST 2012

26

slide-26
SLIDE 26

The Need for Large Symbols

USENIX FAST 2012

27

slide-27
SLIDE 27

Outline

  • Erasure Coded Storage Systems
  • Algorithm for minimizing number of symbols
  • Rotated Reed-Solomon Codes
  • Analysis & Evaluation
  • Conclusions

USENIX FAST 2012

28

slide-28
SLIDE 28

Conclusions

  • Traditional RAID based configurations do not give good

recovery performance with cloud based erasure coded storage systems

  • Large sealed blocks recommended ( at least around 100 MB,

preferably > 500 MB )

  • Minimizing the number of symbols needed for recovery does

result in lower I/O cost

  • Generally, optimally-sparse and minimum-density codes

perform best for disk reconstruction

  • Rotated Reed-Solomon Codes are a better alternative to

standard Reed-Solomon for cloud storage

USENIX FAST 2012

29

slide-29
SLIDE 29

Thank you!

USENIX FAST 2012

30