1
Research on Efficient Erasure-Coding- Based Cluster Storage Systems
Patrick P. C. Lee
The Chinese University of Hong Kong
NCIS’14
Joint work with Runhui Li, Jian Lin, Yuchong Hu
Research on Efficient Erasure-Coding- Based Cluster Storage Systems - - PowerPoint PPT Presentation
Research on Efficient Erasure-Coding- Based Cluster Storage Systems Patrick P. C. Lee The Chinese University of Hong Kong NCIS14 1 Joint work with Runhui Li, Jian Lin, Yuchong Hu Motivation Clustered storage systems are widely deployed
1
The Chinese University of Hong Kong
Joint work with Runhui Li, Jian Lin, Yuchong Hu
2
LAN
3
4
5
File encode divide Nodes (n, k) = (4, 2) A B C D A+C B+D A+D B+C+D A B C D A+C B+D A+D B+C+D A B C D
6
7
8
Node 1 Node 2 Node 3 Node 4
repaired node
Recovery Traffic = = M +
A B C D A+C B+D A+D B+C+D C D A+C B+D A B File of size M A B C D
9
Node 1 Node 2 Node 3 Node 4
[Dimakis et al.; ToIT’10]
A B C D A+C B+D A+D B+C+D C A+C A+B+C A B
+ + Recovery Traffic = = 0.75M repaired node
File of size M A B C D
10
11
12
CORE
S0,0 S0,1 S0,2 S1,0 S1,1 S1,2 S2,0 S2,1 S2,2 S3,0 S3,1 S3,2 S4,0 S4,1 S4,2 S5,0 S5,1 S5,2 Node 0 Node 1 Node 2 Node 3 Node 4 Node 5
s0,0, s0,1, s0,2 = Rec0(e1,0, e2,0, e3,0, e4,0, e5,0) e0,1 = Enc0,1(s0,0, s0,1, s0,2) = Enc0,1(Rec0(e1,0, e2,0, e3,0, e4,0, e5,0))
CORE
S0,0 S0,1 S0,2 S1,0 S1,1 S1,2 S2,0 S2,1 S2,2 S3,0 S3,1 S3,2 S4,0 S4,1 S4,2 S5,0 S5,1 S5,2
13 Node 0 Node 1 Node 2 Node 3 Node 4 Node 5
14
15
0.5 1
1 2 3 4 5 6 7 8 9 10
Bandwidth Ratio
Good Failure Pattern
(12,6) (16,8) (20,10)
0.5 1
2 3 4 5 6 7 8 9
Bandwidth Ratio
Bad Failure Pattern
(12,6) (16,8) (20,10)
16
t t
17
18 Namenode Datanode Datanode block block block block block block block RaidNode
Encoder Encoder Encoder CORE Encoder/Decoder
through JNI
19
1Gbps Ethernet
20
Namenode
Datanode Datanode Datanode
first loaded in memory
decoding time
21
10 20 30 40 50 60 70 (12, 6) (16, 8) (20, 10) Recovery thpt (MB/s) CORE t=1 RS t=1 CORE t=2 RS t=2 CORE t=3 RS t=3
22
23
24
WordCount Example:
<A,2> <B,2> <C,2> Map tasks Reduce tasks <A,1> B C A C A B Slave 0 Slave 1 Slave 2 Shuffle
25
26
while a heartbeat from slave s arrives do for job in jobQueue do if job has a local task on s then assign the local task else if job has a remote task then assign the remote task else if job has a degraded task then assign the degraded task endif endfor endwhile
Processing a block stored in another rack Processing an unavailable block in the system
27
B0,0 B0,1 P0,0 P0,1 B1,0 B2,0 B3,0 P2,1 B4,0 P5,0 B1,1 P3,0 B4,1 P5,1 P1,0 B2,1 P3,1 P4,0 B5,0 P1,1 P2,0 B3,1 P4,1 B5,1 Core Switch ToR Switch ToR Switch S0 S1 S2 S3 S4 10 30 40 time(s) slaves S1 S2 S3 S4
Process B1,1 Process B5,1 Process B2,1 Process B5,0 Process B3,1 Process B5,1 Process B0,1 Process B4,0 Download P2,0 Download P0,0 Download P1,0
Map finishes
Process B0,0 Process B3,0 Process B2,0 Download P3,0 Process B1,0
resources
28
29
Core Switch ToR Switch ToR Switch B0,0 B1,0 B2,0 B3,0 B0,1 P2,1 B4,0 P5,0 B1,1 P3,0 B4,1 P5,1 P0,0 P1,0 B2,1 P3,1 P4,0 B5,0 P0,1 P1,1 P2,0 B3,1 P4,1 B5,1 S0 S1 S2 S3 S4 10 30 Map finishes time(s) S1 S2 S3 S4 slaves
Process B1,1 Process B5,1 Process B4,0 Download P0,0 Download P2,0 Process B5,0 Process B3,1 Process B5,1 Process B0,0 Process B0,1 Process B2,0 Download P1,0 Download P3,0 Process B1,0 Process B2,1 Process B3,0
30
while a heartbeat from slave s arrives do if
𝒏 𝑵 ≥ 𝒏𝒆 𝑵𝒆 and job has a degraded task then
assign the degraded task endif assign other map slots as in locality-first scheduling endwhile
31
Core Switch ToR Switch ToR Switch B0,0 B1,0 B2,0 B3,0 B0,1 P2,1 B4,0 P5,0 B1,1 P3,0 B4,1 P5,1 P0,0 P1,0 B2,1 P3,1 P4,0 B5,0 P0,1 P1,1 P2,0 B3,1 P4,1 B5,1 10 30 50 S1 S2 S3
Download P0,0 Process B0,0 Process B2,1 Process B1,1 (1) (2) (3) Process B3,1 Download P1,0 Process B1,0 Process B0,1 Process B5,0 Process B3,0 Download P2,0 Process B2,0 Process B4,1 Process B4,0 Process B5,1 (4) (5) (6) (7) (8) (9) (10) (11) (12)
32
33
34
failures and latent sector errors [FAST’14]
coded storage [FAST’14]
[HotStorage’14] 35
in erasure-coded storage in failure mode
theoretical analysis
http://www.cse.cuhk.edu.hk/~pclee
36