An Adaptiv aptive e Erasu asure re-Code Coded d St Stor orage - PowerPoint PPT Presentation

An Adaptiv aptive e Erasu asure re-Code Coded d St Stor orage age Sc Scheme eme with h an an Efficient icient Co Code-Switching Switching Algo lgorith rithm Zizhong Wang, Haixia Wang, Airan Shao, and Dongsheng Wang Tsinghua University

Rea eally lly Big ig Da Data ta - Pre resent sent an and Fut utur ure 1 ZB = 1,180,591,620,717,411,303,424 B 175 ZB = 206,603,533,625,546,978,099,200 B https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf

Di Dist stributed ributed St Storage rage Sy Syst stems ems • How to guarantee reliability and availability? • N-way replication • GFS (3-way) • N × storage cost to tolerate any (N-1) faults • Too expensive , especially when data amount grows fast • Simple, still the default setting in HDFS, Ceph • Erasure coding • HDFS (since 3.0.0), Azure, Ceph • A (k,m) code can tolerate any m faults at a (1+m/k) × storage cost • Can save much storage space

An E n Example xample of Er f Erasure asure Coding ing • 3-way replication vs a (2,2) code, original data: 𝑏 𝑐 • 3-way replication: 𝑏 𝑐 𝑏 𝑐 𝑏 𝑐 NODE 1 NODE 2 NODE 3 • a (2,2) code: 𝑏 𝑐 𝑏 + 𝑐 𝑏 + 2𝑐 NODE 1 NODE 2 NODE 3 NODE 4 • They both can tolerate any 2 faults, but 3-way replication costs 3 × storage space while the (2,2) code costs only 2 ×

Erasu rasure re Coding ing – Wha What do t do We Co We Conc ncern? ern? • Storage cost • In a (k,m) code: (1+m/k) × • Fault tolerance ability • In a (k,m) code: m • Recovery cost • Discuss later • Write performance • Correlated with storage cost • Hard-sell advertising: in asynchronous situation, can use CRaft ([FAST ’20] Wang et al.) • Update performance • …

Majo ajor r Concern: ncern: Rec ecovery overy Cost st • 3-way replication: 𝑏 𝑐 𝑏 𝑐 𝑏 𝑐 NODE 1 NODE 2 NODE 3 • a (2,2) code: 𝑏 𝑐 𝑏 + 𝑐 𝑏 + 2𝑐 NODE 1 NODE 2 NODE 4 NODE 3 • Conclusion: k times recovery cost in (k,m) code

De Degraded graded Rea ead • ＞ 90% data center errors are temporary errors ([OSDI ’10] Ford et al.) • No data are permanently lost • Solved by degraded reads • Read from other nodes and then decode • Our goal: reduce degraded read cost

Degraded Read Cost Trade rade-Off Offs • Different code families • MDS/non- MDS, locality, … Storage Cost Fault Tolerance Ability • Different parameters • small k + small m/k • low degraded read cost and storage cost, but low fault tolerance ability • small k + big m • low degraded read cost, high fault tolerance ability, but high storage cost • small m/k + big m • low storage cost, high fault tolerance ability, but high degrade read cost

Da Data ta Access ccess Sk Skew ew Data access frequency is Zipf distribution About 80% data accesses are applied in 10% data volume [VLDB ’12] Chen et al.

Di Divide vide an and Conq nquer uer • Premise: guaranteed fault tolerance ability • Hot data – degraded read cost is most important • Cold data – storage cost is most important • Data with different properties should be stored by different codes • A fast code for hot data • Low degraded read cost and high enough fault tolerance ability • High storage cost is acceptable • A compact code for cold data • Low storage cost and high enough fault tolerance ability • High degraded read cost is acceptable

Code-Switc Switching hing Pro roblem blem • According to temporal locality, hot data will become cold • Cold data may become hot in some cases • Problem: code-switching from one code to another code 𝑏 1 𝑏 2 𝑏 3 𝑏 4 𝑏 5 𝑏 6 𝑔 1 𝑏 𝑔 2 𝑏 𝑏 1 𝑏 2 𝑏 3 𝑏 4 𝑏 5 𝑏 6 𝑔 3 𝑏 𝑔 4 𝑏 ? • To compute 𝑔 3 𝑏 and 𝑔 4 𝑏 , 𝑏 should be collected first • Bandwidth-consuming

All lleviate eviate th the P e Prob roblem lem • HACFS ([FAST ’15] Xia et al.) • Use two codes in the same code family with different parameters • Alleviate the code-switching problem by using the similarity in one code family • Cannot take advantage of the trade-off in different code families • Cannot get rid of the code family’s inherent defects • Impossible to set an MDS compact code • Our Scheme • We present an efficient code-switching algorithm

Ou Our Sch r Scheme eme • We choose Local Reconstruction Code (LRC) as fast code, Hitchhiker (HH) as compact code • (k,m-1,m)-LRC and (k,m)-HH • Reasons 1. LRC has good fast code properties • Good locality 2. HH has good compact code properties • MDS 3. Common. Been implemented in HDFS or Ceph 4. They are similar. Both based on RS; data chunks be grouped

LR LRC • Fast code • An example of (6,2,3)-LRC 𝑏 1 𝑏 2 𝑏 3 𝑏 4 𝑏 5 𝑏 6 𝑐 1 𝑐 2 𝑐 3 𝑐 4 𝑐 5 𝑐 6 𝑔 1 𝑏 𝑔 2 𝑏 𝑔 3 𝑏 𝑏 1 ⊕ 𝑏 2 ⊕ 𝑏 3 𝑏 4 ⊕ 𝑏 5 ⊕ 𝑏 6 𝑔 1 𝑐 𝑔 2 𝑐 𝑔 3 𝑐 𝑐 1 ⊕ 𝑐 2 ⊕ 𝑐 3 𝑐 4 ⊕ 𝑐 5 ⊕ 𝑐 6

HH HH • Compact code • An example of (6,3)-HH 𝑏 1 𝑏 2 𝑏 3 𝑏 4 𝑏 5 𝑏 6 𝑐 1 𝑐 2 𝑐 3 𝑐 4 𝑐 5 𝑐 6 𝑔 1 𝑏 𝑔 2 𝑏 𝑔 3 𝑏 𝑔 1 𝑐 𝑔 2 𝑐 ⊕ 𝑏 1 ⊕ 𝑏 2 ⊕ 𝑏 3 𝑔 3 𝑐 ⊕ 𝑏 4 ⊕ 𝑏 5 ⊕ 𝑏 6

Scheme I 𝑏 1 𝑏 2 𝑏 3 𝑏 4 𝑏 5 𝑏 6 LRC → HH 𝑐 1 𝑐 2 𝑐 3 𝑐 4 𝑐 5 𝑐 6 𝑔 1 𝑏 𝑔 2 𝑏 𝑔 3 𝑏 𝑏 1 ⊕ 𝑏 2 ⊕ 𝑏 3 𝑏 4 ⊕ 𝑏 5 ⊕ 𝑏 6 𝑔 1 𝑐 𝑔 2 𝑐 𝑔 3 𝑐 𝑐 1 ⊕ 𝑐 2 ⊕ 𝑐 3 𝑐 4 ⊕ 𝑐 5 ⊕ 𝑐 6 𝑏 1 𝑏 2 𝑏 3 𝑏 4 𝑏 5 𝑏 6 𝑐 1 𝑐 2 𝑐 3 𝑐 4 𝑐 5 𝑐 6 𝑔 1 𝑏 𝑔 2 𝑏 𝑔 3 𝑏 𝑔 1 𝑐 𝑔 2 𝑐 ⊕ 𝑏 1 ⊕ 𝑏 2 ⊕ 𝑏 3 𝑔 3 𝑐 ⊕ 𝑏 4 ⊕ 𝑏 5 ⊕ 𝑏 6

𝑏 1 𝑏 2 𝑏 3 𝑏 1 𝑏 2 𝑏 3 𝑏 1 ⊕ 𝑏 2 ⊕ 𝑏 3 𝑏 4 𝑏 5 𝑏 6 𝑏 4 𝑏 5 𝑏 6 𝑏 4 ⊕ 𝑏 5 ⊕ 𝑏 6 𝑐 1 𝑐 2 𝑐 3 𝑐 1 𝑐 2 𝑐 3 𝑐 1 ⊕ 𝑐 2 ⊕ 𝑐 3 𝑐 4 𝑐 5 𝑐 6 𝑐 4 𝑐 5 𝑐 6 𝑐 4 ⊕ 𝑐 5 ⊕ 𝑐 6 𝑔 1 𝑏 𝑔 2 𝑏 𝑔 3 𝑏 𝑔 1 𝑐 𝑔 2 𝑐 ⊕ 𝑏 1 ⊕ 𝑏 2 ⊕ 𝑏 3 𝑔 3 𝑐 ⊕ 𝑏 4 ⊕ 𝑏 5 ⊕ 𝑏 6 Scheme I 𝑏 1 𝑏 2 𝑏 3 𝑏 4 𝑏 5 𝑏 6 HH → LRC 𝑐 1 𝑐 2 𝑐 3 𝑐 4 𝑐 5 𝑐 6 𝑔 1 𝑏 𝑔 2 𝑏 𝑔 3 𝑏 𝑏 1 ⊕ 𝑏 2 ⊕ 𝑏 3 𝑏 4 ⊕ 𝑏 5 ⊕ 𝑏 6 𝑔 1 𝑐 𝑔 2 𝑐 𝑔 3 𝑐 𝑐 1 ⊕ 𝑐 2 ⊕ 𝑐 3 𝑐 4 ⊕ 𝑐 5 ⊕ 𝑐 6

A New New Sc Scheme heme • When HH uses XOR sum of data chunks as the first parity chunk, a global parity chunk of LRC can be saved • (k,m-1,m-1)-LRC and (k,m)-HH 𝑏 1 𝑏 2 𝑏 3 𝑏 4 𝑏 5 𝑏 6 𝑏 1 𝑏 2 𝑏 3 𝑏 4 𝑏 5 𝑏 6 𝑏 1 ⊕ 𝑏 2 ⊕ 𝑏 3 ⊕ 𝑏 4 ⊕ 𝑏 5 ⊕ 𝑏 6 𝑐 1 𝑐 2 𝑐 3 𝑐 4 𝑐 5 𝑐 6 𝑐 1 𝑐 2 𝑐 3 𝑐 4 𝑐 5 𝑐 6 𝑐 1 ⊕ 𝑐 2 ⊕ 𝑐 3 ⊕ 𝑐 4 ⊕ 𝑐 5 ⊕ 𝑐 6 𝑔 2 𝑏 𝑔 3 𝑏 𝑏 1 ⊕ 𝑏 2 ⊕ 𝑏 3 𝑏 4 ⊕ 𝑏 5 ⊕ 𝑏 6 𝑔 2 𝑏 𝑔 3 𝑏 𝑔 2 𝑐 𝑔 3 𝑐 𝑐 1 ⊕ 𝑐 2 ⊕ 𝑐 3 𝑐 4 ⊕ 𝑐 5 ⊕ 𝑐 6 𝑔 2 𝑐 ⊕ 𝑏 1 ⊕ 𝑏 2 ⊕ 𝑏 3 𝑔 3 𝑐 ⊕ 𝑏 4 ⊕ 𝑏 5 ⊕ 𝑏 6 (6,2,2)-LRC (6.3)-HH

Scheme II 𝑏 1 𝑏 2 𝑏 3 𝑏 4 𝑏 5 𝑏 6 LRC → HH 𝑐 1 𝑐 2 𝑐 3 𝑐 4 𝑐 5 𝑐 6 ⊕ 𝑔 2 𝑏 𝑔 3 𝑏 𝑏 1 ⊕ 𝑏 2 ⊕ 𝑏 3 𝑏 4 ⊕ 𝑏 5 ⊕ 𝑏 6 ⊕ 𝑔 2 𝑐 𝑔 3 𝑐 𝑐 1 ⊕ 𝑐 2 ⊕ 𝑐 3 𝑐 4 ⊕ 𝑐 5 ⊕ 𝑐 6 𝑏 1 𝑏 2 𝑏 3 𝑏 4 𝑏 5 𝑏 6 𝑏 1 ⊕ 𝑏 2 ⊕ 𝑏 3 ⊕ 𝑏 4 ⊕ 𝑏 5 ⊕ 𝑏 6 𝑐 1 𝑐 2 𝑐 3 𝑐 4 𝑐 5 𝑐 6 𝑐 1 ⊕ 𝑐 2 ⊕ 𝑐 3 ⊕ 𝑐 4 ⊕ 𝑐 5 ⊕ 𝑐 6 𝑔 2 𝑏 𝑔 3 𝑏 𝑔 2 𝑐 ⊕ 𝑏 1 ⊕ 𝑏 2 ⊕ 𝑏 3 𝑔 3 𝑐 ⊕ 𝑏 4 ⊕ 𝑏 5 ⊕ 𝑏 6

𝑏 1 ⊕ 𝑏 2 ⊕ 𝑏 3 𝑏 1 𝑏 2 𝑏 3 𝑏 1 𝑏 2 𝑏 3 𝑏 4 𝑏 5 𝑏 6 𝑏 4 𝑏 5 𝑏 6 𝑏 4 𝑏 5 𝑏 6 𝑏 4 ⊕ 𝑏 5 ⊕ 𝑏 6 𝑐 1 𝑐 2 𝑐 3 𝑐 1 𝑐 2 𝑐 3 𝑐 4 𝑐 5 𝑐 6 𝑐 1 ⊕ 𝑐 2 ⊕ 𝑐 3 𝑐 4 𝑐 5 𝑐 6 𝑐 4 𝑐 5 𝑐 6 𝑐 4 ⊕ 𝑐 5 ⊕ 𝑐 6 𝑔 2 𝑏 𝑔 3 𝑏 𝑔 2 𝑐 ⊕ 𝑏 1 ⊕ 𝑏 2 ⊕ 𝑏 3 𝑔 3 𝑐 ⊕ 𝑏 4 ⊕ 𝑏 5 ⊕ 𝑏 6 Scheme II 𝑏 1 𝑏 2 𝑏 3 𝑏 4 𝑏 5 𝑏 6 HH → LRC 𝑐 1 𝑐 2 𝑐 3 𝑐 4 𝑐 5 𝑐 6 𝑔 2 𝑏 𝑔 3 𝑏 𝑏 1 ⊕ 𝑏 2 ⊕ 𝑏 3 𝑏 4 ⊕ 𝑏 5 ⊕ 𝑏 6 𝑔 2 𝑐 𝑔 3 𝑐 𝑐 1 ⊕ 𝑐 2 ⊕ 𝑐 3 𝑐 4 ⊕ 𝑐 5 ⊕ 𝑐 6

Per erformance formance Ana naly lysis sis

Code-Switc Switching hing Eff ffic iciency iency • Ratio I: the amount of data transferred during code-switching to the amount of data transferred during encoding

Code-Switc Switching hing Eff ffic iciency iency • Ratio II: the total amount of data transferred during encoding to hot data form and switching into cold data form to the amount of data transferred when directly encoding into cold data form

Experiment xperiment Set Setup up • (k,m)=(12,4) • (12,3,4)-LRC and (12,4)-HH (Scheme I) • (12,3,3)-LRC and (12,4)-HH (Scheme II) • Storage overhead set to 1.4 × • Schemes implemented upon Ceph • Workload generated randomly, data access frequency set to be Zipf distributed

Rec ecovery overy Cost st

Code-Switc Switching hing Time ime

An Adaptiv aptive e Erasu asure re-Code Coded d St Stor orage - PowerPoint PPT Presentation

An Adaptiv aptive e Erasu asure re-Code Coded d St Stor orage age Sc Scheme eme with h an an Efficient icient Co Code-Switching Switching Algo lgorith rithm Zizhong Wang, Haixia Wang, Airan Shao, and Dongsheng Wang Tsinghua

M12 X-coded 10Gb/s M12 X-Coded Field installable for Rail D4 Industrial Ethernet, Ethernet/IP

ASURE an applied-research organization affiliated with the nation s fastest growing research

C APTIVE P OWER G ENERATION , G RID C ONNECTIVITY , AND H OUSEHOLD W ELFARE IN B ANGLADESH Sakib

Maintain ntaining ing Resilie ilienc nce, Adapti aptive ve Policy icy Measu sures

Inve stor Inve stor Pr e se ntation Pr e se ntation Q1 F Y2008 Q1 F Y2008 Financial

A STOR A STOR STORY SO FAR STORY SO FAR Y SO FAR SO FAR Brian Bruce Brian Bruce

Update Update 2011 Se pte mbe r Inve stor Inve stor Disc laime r & Compe te nt Pe r

Campaign gn S Stor orage stor orage f for or tiers space ce f for or everything Peter

Coded Computational Photography ! EE367/CS448I: Computational Imaging and Display !

Turbo Codes and Turbo-Coded Modulation Turbo Codes and Turbo-Coded Modulation in CDMA Mobile

WELCOME DATA SCIENCE STRATEGY Are we ready for it? Asure / DOMO 3 DATA SCIENCE STRATEGY

IL L INOIS ST AT E T RE ASURE RS OF F ICE Ab o ut T he I llino is F unds e ate d

Make Data Work for Students in Illinois Elizabe zabeth th Dab abne ney Meas Me asure ure

Why Multiple Measure Reporting? Whe n imple me nte d pro pe rly, multi-me asure s pro vide a

The The radi radiation on do dose me meas asure rement for the for the cohe ohere rent

PQR QRS S Selecting ecting 2015 5 Meas asure res 7 Oc October 2015 Presented by: Sarah

Cybercrime Part II Tyler Moore Computer Science & Engineering Department, SMU, Dallas, TX

MA111: Contemporary mathematics . Jack Schmidt University of Kentucky November 16, 2011

Total returns of emerging market debt - YTD Local currency bonds have outperformed hard currency

Network Flow Lecturer: Shi Li Department of Computer Science and Engineering University at

How should you approach Sales on Social Channels? digital business podcast range - 1 - sales

free to play project Stephan Beier Senior Project Lead, Travian Games Agenda (1/2) About me

W l Welcome! ! The webinar will begin at The webinar will begin at 2:00 Eastern/11:00 Pacific

Last Time Looked at ColdFire and ARM in depth u Today Tools and toolchains for embedded u

Sambuz

Useful Links

Newsletter

Mail Us

An Adaptiv aptive e Erasu asure re-Code Coded d St Stor orage - PowerPoint PPT Presentation

An Adaptiv aptive e Erasu asure re-Code Coded d St Stor orage age Sc Scheme eme with h an an Efficient icient Co Code-Switching Switching Algo lgorith rithm Zizhong Wang, Haixia Wang, Airan Shao, and Dongsheng Wang Tsinghua

M12 X-coded 10Gb/s M12 X-Coded Field installable for Rail D4 Industrial Ethernet, Ethernet/IP

ASURE an applied-research organization affiliated with the nation s fastest growing research

C APTIVE P OWER G ENERATION , G RID C ONNECTIVITY , AND H OUSEHOLD W ELFARE IN B ANGLADESH Sakib

Maintain ntaining ing Resilie ilienc nce, Adapti aptive ve Policy icy Measu sures

Inve stor Inve stor Pr e se ntation Pr e se ntation Q1 F Y2008 Q1 F Y2008 Financial

A STOR A STOR STORY SO FAR STORY SO FAR Y SO FAR SO FAR Brian Bruce Brian Bruce

Update Update 2011 Se pte mbe r Inve stor Inve stor Disc laime r &amp; Compe te nt Pe r

Campaign gn S Stor orage stor orage f for or tiers space ce f for or everything Peter

Coded Computational Photography ! EE367/CS448I: Computational Imaging and Display !

Turbo Codes and Turbo-Coded Modulation Turbo Codes and Turbo-Coded Modulation in CDMA Mobile

WELCOME DATA SCIENCE STRATEGY Are we ready for it? Asure / DOMO 3 DATA SCIENCE STRATEGY

IL L INOIS ST AT E T RE ASURE RS OF F ICE Ab o ut T he I llino is F unds e ate d

Make Data Work for Students in Illinois Elizabe zabeth th Dab abne ney Meas Me asure ure

Why Multiple Measure Reporting? Whe n imple me nte d pro pe rly, multi-me asure s pro vide a

The The radi radiation on do dose me meas asure rement for the for the cohe ohere rent

PQR QRS S Selecting ecting 2015 5 Meas asure res 7 Oc October 2015 Presented by: Sarah

Cybercrime Part II Tyler Moore Computer Science &amp; Engineering Department, SMU, Dallas, TX

MA111: Contemporary mathematics . Jack Schmidt University of Kentucky November 16, 2011

Total returns of emerging market debt - YTD Local currency bonds have outperformed hard currency

Network Flow Lecturer: Shi Li Department of Computer Science and Engineering University at

How should you approach Sales on Social Channels? digital business podcast range - 1 - sales

free to play project Stephan Beier Senior Project Lead, Travian Games Agenda (1/2) About me

W l Welcome! ! The webinar will begin at The webinar will begin at 2:00 Eastern/11:00 Pacific

Last Time Looked at ColdFire and ARM in depth u Today Tools and toolchains for embedded u

Sambuz

Useful Links

Newsletter

Mail Us

Update Update 2011 Se pte mbe r Inve stor Inve stor Disc laime r & Compe te nt Pe r

Cybercrime Part II Tyler Moore Computer Science & Engineering Department, SMU, Dallas, TX