Dynamo
A Tale of Two Erasure Codes in HDFS
1
Mingyuan Xia*, Mohit Saxena+ , Mario Blaum+, and David A. Pease+
*McGill University, +IBM Research Almaden
FAST’15
何军权 2015-04-30
A Tale of Two Erasure Codes in HDFS Dynamo Mingyuan Xia * , Mohit - - PowerPoint PPT Presentation
A Tale of Two Erasure Codes in HDFS Dynamo Mingyuan Xia * , Mohit Saxena + , Mario Blaum + , and David A. Pease + * McGill University, + IBM Research Almaden FAST 15 2015-04-30 1 Outline Introduction & Motivation Design
1
Mingyuan Xia*, Mohit Saxena+ , Mario Blaum+, and David A. Pease+
*McGill University, +IBM Research Almaden
FAST’15
何军权 2015-04-30
2
3
4
4
3-way replication 3x, 2003
RS, 1.4x, 2011
RS, 1.5x, 2012
LRC, 1.33x, 2012
LRC, 1.66x, 2013
5
5
PC LRC
6
7
Multiple disk reads, network transfers and compute cycles to
8
100K blocks lost per day 50 machine-unavailablility events per day Reconstruction traffic: 180TB per day
9
Recover Cost: the total number of blocks required to reconstruction a data block after failure
10
FB HDFS RS GFS v2 RS Azure LRC FB HDFS LRC GFS 3-way Repl
11 11
12
P(freq > 10) ~= 1%
P(freq <= 10) ~= 99%
12
13
13
14
File size, last mTime Read count and coding state
Encode/Decode Upcode/Downcode
15
Encode the frequently accessed blocks to reduce the read latency
Provide overall low recovery cost
Encode the less frequently accessed blocks to get low storage
Maintain a low and bounded storage overhead
15
16
Recently created
Write cold COND' COND COND
COND : Read Hot and Bounded COND': Read Cold or Not Bounded
COND'
17
17
18
18
19
19
20 20
21 21
b0 b1 b2 b3 b4 hb1 b5 b6 b7 b8 b9 hb2 Pb0 Pb1 Pb2 Pb3 Pb4 Phb a0 a1 a2 a3 a4 ha1 a5 a6 a7 a8 a9 ha2 Pa0 Pa1 Pa2 Pa3 Pa4 Pha c0 c1 c2 c3 c4 hc1 c5 c6 c7 c8 c9 hc2 Pc0 Pc1 Pc2 Pc3 Pc4 Phc a0 a1 a2 a3 a4 ha1 a5 a6 a7 a8 a9 ha2 b0 b1 b2 b3 b4 hb1 b5 b6 b7 b8 b9 hb2 c0 c1 c2 c3 c4 hc1 c5 c6 c7 c8 c9 hc2 P0 P1 P2 P3 P4 Ph
22
22
CC: Cloudera Customer FB: Facebook
23
23
24
Bounded the storage overhead of HACFS LRC and PC to 1.4 and 1.5
25
HACFS-PC takes about 10-35 minutes less than Production
HACFS-LRC is worse than RS(6,3) in GFS v2
To reconstruction global parities, HACFS-LRC need to read 12
26
Colossus FS:RS(6,3)-1.5x
HDFS-Raid: RS(10,4)-1.4x
Azure: LRC(12,2,2)-1.33x
26
HACFS-PC:
PC(2x5)-1.8x
PC(6x5)-1.4x
HACFS-LRC:
LRC(12,6,2)-1.67x
LRC(12,2,2)-1.33x
27
Colossus FS:RS(6,3)-1.5x
HDFS-Raid: RS(10,4)-1.4x
Azure: LRC(12,2,2)-1.33x
27
HACFS-PC:
PC(2x5)-1.8x
PC(6x5)-1.4x
HACFS-LRC:
LRC(12,6,2)-1.67x
LRC(12,2,2)-1.33x lost block type HACFS-PC HACFS-LRC Colossus FS HDFS-RAID Azure data block fast: 2 fast: 2 6 10 6 comp: 5 comp: 6 global parity fast: 5 fast: 12 6 10 12 comp: 6 comp: 12
28
Colossus FS:RS(6,3)-1.5x
HDFS-Raid: RS(10,4)-1.4x
Azure: LRC(12,2,2)-1.33x
28
HACFS-PC:
PC(2x5)-1.8x
PC(6x5)-1.4x
HACFS-LRC:
LRC(12,6,2)-1.67x
LRC(12,2,2)-1.33x lost block type HACFS-PC HACFS-LRC Colossus FS HDFS-RAID Azure data block fast: 2 fast: 2 6 10 6 comp: 5 comp: 6 global parity fast: 5 fast: 12 6 10 12 comp: 6 comp: 12
29
30
30
31 31
32
CAS– ICT – Storage System Group 32