upright cluster services
play

UpRight Cluster Services Allen Clement, Manos Kapritsos, Sangmin - PowerPoint PPT Presentation

UpRight Cluster Services Allen Clement, Manos Kapritsos, Sangmin Lee, Yang Wang, Lorenzo Alvisi, Mike Dahlin, Taylor Riche The University of Texas at Austin Failures are not fail-stop To the rescue Byzantine Fault Tolerance (BFT) tolerate


  1. UpRight Cluster Services Allen Clement, Manos Kapritsos, Sangmin Lee, Yang Wang, Lorenzo Alvisi, Mike Dahlin, Taylor Riche The University of Texas at Austin

  2. Failures are not fail-stop

  3. To the rescue Byzantine Fault Tolerance (BFT) tolerate arbitrary failures f safe always good performance with failures if network behaves well eventually live

  4. This talk BFT in real systems ZooKeeper, H adoop D istributed F ile S ystem What does it take? Revising much of what we think we know Failure model BFT implementation API

  5. This talk BFT in real systems ZooKeeper, H adoop D istributed F ile S ystem What does it take? Revising much of what we think we know Failure model BFT implementation API

  6. For better or for worse Byzantine model is most general all you need are replicas... 3 f +1

  7. Up Right

  8. Up maximum number of failures under which u = liveness* is ensured

  9. ��� Right maximum number of malicious failures r = under which safety is preserved

  10. Up Right maximum number of failures under which u = liveness* is ensured maximum number of malicious failures r = under which safety is preserved (Lamport 2003; Dutta et al 2005; El-Malek et al 2005) • •

  11. Up Right maximum number of failures under which u = liveness* is ensured maximum number of malicious failures r = under which safety is preserved (Lamport 2003; Dutta et al 2005; El-Malek et al 2005) • agreement : replicas 2 u + r +1 •

  12. Up Right maximum number of failures under which u = liveness* is ensured maximum number of malicious failures r = under which safety is preserved (Lamport 2003; Dutta et al 2005; El-Malek et al 2005) • u=0 u=1 u=2 u=3 Crash tolerant r=0 1 3 5 7 Replicas required for r=1 2 4 6 8 “Pay per B” FT agreement r=2 3 5 7 9 2 u + r +1 BFT r=3 4 6 8 10

  13. One Library to Rule Them All Only pay for the fault tolerance you need One fault tolerant library

  14. Revising what we think we know Failure model BFT implementation API

  15. Separating Order from Execution Order Execution Ordered Request request Separating agreement from execution for Byzantine fault tolerant services [SOSP 2003]

  16. Big MAC Attack c c c Making BFT systems tolerate Byzantine failures [NSDI 2009]

  17. Big MAC Attack c c c Making BFT systems tolerate Byzantine failures [NSDI 2009]

  18. Big MAC Attack c c c c c c c c c c Faulty Client Faulty Primary

  19. A More Perfect Separation Authentication Order Execution Valid Ordered Request request request

  20. A More Perfect Union Authentication Order Execution

  21. Speculating Zyzzyva (Kotla et al 2003) Voter Command Order Execution

  22. Misunderspeculation Speculation is a good idea Speculative execution is a bad idea if wrong, lots of work it’ s not about execution, anyway

  23. Misunderspeculation Speculation is a good idea Speculative execution is a bad idea if wrong, lots of work it’ s not about execution, anyway Authentication Execution Command UpRight speculative ordering execution nodes never roll back

  24. Revisiting conventional wisdom Failure model BFT implementation API

  25. API -execute App -loadCP App -execute -takeCP -result Library Library -result -returnCP UpRight World Order Old World Order

  26. Case Study: Make HDFS UpRight NameNode Map files to blocks Map blocks to data nodes Users Data Nodes Store blocks

  27. What was required? Make execution deterministic ~150 lines of code Make checkpoints deterministic and complete ~1500 lines of code That’ s it.

  28. Do DataNodes Need the UpRight Treatment? NameNode Primary UpRight Users UpRight UpRight UpRight Data Nodes

  29. Modified DataNode NameNode Primary UpRight Users block hash Data Nodes

  30. Modified DataNode NameNode Primary UpRight hash Users hash hash hash Data Nodes

  31. Modified DataNode NameNode Primary UpRight hash Users block Data Nodes

  32. HDFS LOC Changes NameNode NameNode DataNode Execution Checkpoints Protocol ~150 ~1500 ~900 HDFS: ~37k LOC total

  33. HDFS Evaluation Amazon S3 small instances 50 clients each client writes/reads 1 GB file 50 data nodes Authentication / DataNode replication HDFS configuration Order / NameNodes factor Original HDFS - / - / 1 3 CFT HDFS (u=1,r=0) 3 / 3 / 3 3 BFT HDFS (u=1,r=1) 4 / 4 / 3 3

  34. HDFS Throughput 1,000 HDFS CFT HDFS BFT HDFS 800 Throughput (MB/s) 600 400 200 0 Write Read

  35. HDFS Computational Costs 1,200 UpRight Core 1,000 Data Node Name Node 800 Mcycles/GB 600 400 200 0 HDFS CFT_HDFS BFT_HDFS HDFS CFT_HDFS BFT_HDFS Write Read

  36. This talk BFT in real systems ZooKeeper, H adoop D istributed F ile S ystem What it took UpRight BFT Implementation API

  37. What the future holds The plural of “anecdote” is not “data” Quantify the risks how frequently do Byzantine failures occur? how much damage can they do? Quantify the benefits what fraction of these failures does BFT mask?

  38. Matrix signatures (Aiyer et al 2008) Separate order from authentication Order c c c c c c c

  39. Matrix signatures (Aiyer et al 2008) c

  40. Matrix signatures (Aiyer et al 2008) Primary orders request if sufficiently many valid MACs c

  41. Matrix signatures (Aiyer et al 2008) Validity: request is from client n ≥ r + 1 Primary orders request if sufficiently many valid MACs c

  42. Matrix signatures (Aiyer et al 2008) Validity: request is from client n ≥ r + 1 Primary orders request if Transitive validity: convince others sufficiently many valid n ≥ 2 r + 1 MACs c

  43. Matrix signatures (Aiyer et al 2008) Validity: request is from client n ≥ r + 1 Primary orders request if Transitive validity: convince others sufficiently many valid n ≥ 2 r + 1 MACs Liveness: request will go through c n ≥ 2 r + u + 1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend