a hitchhiker s guide to fast and efficient data
play

- PowerPoint PPT Presentation

A"Hitchhikers"Guide"to"Fast"and"Efficient"Data" Reconstruc:on"in"Erasure;coded"Data"Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur, K. Ramchandran


  1. A"“Hitchhiker’s”"Guide"to"Fast"and"Efficient"Data" Reconstruc:on"in"Erasure;coded"Data"Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur, K. Ramchandran

  2. Need"for"Redundant"Storage"" in"Data"Centers" • "Frequent"unavailability"events"in"data"centers" – unreliable"components" – soHware"glitches,"maintenance"shutdowns,"" " "power"failures,"etc." • "Redundancy"necessary"for"reliability"and"availability" " "

  3. Popular"Approach"for"Redundant"Storage:" Replica:on" • Distributed"file"systems"used"in"data"centers"store" mul:ple"copies"of"data"on"different"machines"" " • Machines"typically"chosen"on"different"racks"" – to"tolerate"rack"failures" " E.g.,"Hadoop"Distributed"File"System"(HDFS)"stores"""" 3"replicas"by"default" " "

  4. HDFS" FILE% divide"into"blocks" d" e" f" g" h" i" j" a" b" c" introduce"redundancy" b" c" d" e" f" g" h" i" j" a" a" b" c" d" e" f" g" h" i" j" g" h" i" j" a" b" c" d" e" f" store"distributed" across"network" AS/Router" TOR" TOR" TOR" TOR" …% …% …% …%

  5. Massive"Data"Sizes:"" Need"Alterna:ve"to"Replica:on" " • Small"to"moderately"sized"data:"disk"storage"is" inexpensive"" – replica:on"viable " • No"longer"true"for"massive"scales"of"opera:on" – e.g.,"Facebook"data"warehouse"cluster"stores" mul:ple"tens"of"Petabytes"(PBs)" “Erasure"codes”"are"an"alterna:ve"

  6. Erasure"Codes"in"Data"Centers" " • Facebook"data"warehouse"cluster" – uses"Reed;Solomon"(RS)"codes"instead"of"3; replica:on"on"a"por:on"of"the"data" – savings'of'mul-ple'Petabytes'of'storage'space' "

  7. Erasure"Codes" Replication Reed-Solomon (RS) code a a block 1 block 1 data"blocks" b b block 2 block 2 a+b a block 3 block 3 parity"blocks" b a+2b block 4 block 4 Overhead" 2x" 2x" Fault"" tolerates"any"two"failures" tolerates"any"one"failure" tolerance:" In"general,"erasure"codes"provide"orders"of"magnitude" higher"reliability"at"much"smaller"storage"overheads"

  8. Outline" • Erasure"Codes"in"Data"Centers" – HDFS" • Impact"on"the"data"center"network" – Problem"descrip:on" " • Our"system:"“Hitchhiker”" " • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster" • Literature" "

  9. Outline" • Erasure"Codes"in"Data"Centers" – HDFS" • Impact"on"the"data"center"network" – Problem"descrip:on" " • Our"solu:on:"“Hitchhiker”" " • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster" • Literature" "

  10. Erasure"codes"in"Data"Centers:"" HDFS;RAID" a" b" c" d" e" f" g" h" i" j" Overhead:"3x" g" h" i" j" a" b" c" d" e" f" a" b" c" d" e" f" g" h" i" j" Overhead:"1.4x" b" c" d" e" f" g" h" i" j" P1" P2" P3" P4" a" (10,"4)"Reed;Solomon"code" Borthakur, “HDFS and Erasure Codes (HDFS-RAID)” ! Fan, Tantisiriroj, Xiao and Gibson, “DiskReduce: RAID for Data-Intensive Scalable Computing”, PDSW 09 !

  11. Erasure"codes"in"Data"Centers:"" HDFS;RAID" a" b" c" d" e" f" g" h" i" j" Overhead:"3x" " g" h" i" j" a" b" c" d" e" f" Cannot"tolerate"" a" b" c" d" e" f" g" h" i" j" many"3;failures" Overhead:"1.4x" b" c" d" e" f" g" h" i" j" P1" P2" P3" P4" a" • Any"10"blocks"sufficient" (10,"4)"Reed;Solomon"code" • Can"tolerate"any"4;failures" Borthakur, “HDFS and Erasure Codes (HDFS-RAID)” ! Fan, Tantisiriroj, Xiao and Gibson, “DiskReduce: RAID for Data-Intensive Scalable Computing”, PDSW 09 !

  12. Outline" • Erasure"Codes"in"Data"Centers" – HDFS" • Impact"on"the"data"center"network" – Problem"descrip:on" " • Our"system:"“Hitchhiker”" " • Implementa:on"and"evalua:on" – Facebook"data"warehouse"cluster" • Literature" "

  13. Impact"on"Data"Center"Network" " • Degraded"Reads" Network"Layer" – reques:ng"currently" unavailable"data" – on;the;fly"reconstruc:on" Reconstruc:on"Opera:ons" • Recovery" – periodically"replace" unavailable"blocks" – to"ensure"desired"level"of" reliability" Storage"Layer"

  14. Impact"on"Data"Center"Network" RS"codes"significantly"increase"network" usage"during"reconstruc:on"

  15. Impact"on"Data"Center"Network" Reed-Solomon code Replication a a block 1 a a block 1 a" b" b block 2 Network Transfer a block 2 a+b" Network Transfer & disk IO & disk IO = 1x a+b = 2x block 3 b block 3 a+2b b block 4 block 4 Network"transfer"&"disk"IO" """"""""""="(#data;blocks)"x"(size"of"data"to"be"reconstructed)" In"(10,"4)"RS,"it"is"10x"

  16. Impact"on"Data"Center"Network" Router" a % TOR" TOR" TOR" TOR" a % a% +% +% …% a % …% b % …% …% b % 2b % machine"1" machine"2" machine"3" machine"4" Burdens"the"already"oversubscribed" Top;of;Rack"and"higher"level"switches"

  17. Impact"on"Data"Center"Network:"" Facebook"Data"Warehouse"Cluster" • Mul:ple"PB"of"Reed;Solomon"encoded"data " • Median"of"180"TB"transferred"across"racks"per"day"for"RS" reconstruc:on"≈"5":mes"that"under"3;replica:on " Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”, Usenix HotStorage Workhsop 2013 "

  18. RS"codes:"The"Good"and"The"Bad" • Maximum"possible"fault;tolerance"for"given" storage"overhead"" – storage;capacity"op:mal"" – (“ maximum&distance&separable ”"in"coding"theory"parlance) " • Flexibility"in"choice"of"parameters" – Supports"any"number"of"data"and"parity"blocks " " • Not"designed"to"handle"reconstruc:on" opera:ons"efficiently" " – nega:ve"impact"on"the"network"

  19. Goal% RS"codes:"The"Good"and"The"Bad" • Maximum"possible"fault;tolerance"for"given" storage"overhead"" – storage;capacity"op:mal"" Maintain" – (“ maximum&distance&separable ”"in"coding"theory"parlance) " • Flexibility"in"choice"of"parameters" – Supports"any"number"of"data"and"parity"blocks " " • Not"designed"to"handle"reconstruc:on" opera:ons"efficiently" " Improve" – nega:ve"impact"on"the"network"

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend