byzantine fault tolerance
play

Byzantine Fault Tolerance Consensus Strikes Back Announcements Lab - PowerPoint PPT Presentation

Byzantine Fault Tolerance Consensus Strikes Back Announcements Lab 2 Hopefully everyone has started by now, maybe even finished large portions. If not ... you should worry . Please don't change the protobufs. My testing strategy


  1. Byzantine Fault Tolerance Consensus Strikes Back

  2. Announcements

  3. Lab 2 • Hopefully everyone has started by now, maybe even finished large portions. • If not ... you should worry . • Please don't change the protobufs. • My testing strategy is going to be to write a few clients and check linearizability. • Changing the interface doesn't let me do that. • Feel free to change whatever is not the interface.

  4. BFT

  5. A Note on Terminology • Byzantine Empire? • Continuation of the Roman Empire, ~400-1450 AD • Commonly used as example of bad bureaucracy, in fighting... • Historical records don't entirely agree with this.

  6. What is the Problem? 0 1 2 3 4

  7. What is the Problem? 0 1 2 3 4

  8. Concrete Problems 0 AppendEntries(..., AppendEntries(..., [(index=4)]) [], leaderCommit=4) Success 1 2 3 4

  9. Concrete Problems 0 VoteGranted( VoteGranted( term=2) term=2) RequestVote( term=2) RequestVote( term=2) 1 2

  10. Concrete Problems

  11. Failure Models • Until now we have considered fail-stop processes. • When failed: stop sending messages and take no steps. • Byzantine faults: when failed do "arbitrary things." • These arbitrary things could even be coordinated with other failed nodes.

  12. However assuming we know participants a priori.

  13. On the internet nobody knows what maps to a user, nor to a machine, ...

  14. Not Considering this Problem • Live in a centralized environment. • All servers/nodes are launched by some centralized entity. • For example Kubernetes or a human with physical access. • Several ways to solve the decentralized problem. • But largely separable from the discussion at hand.

  15. Is This Still Useful? • Yes... • Used by Boeing in the 777 to ensure safety. • Used in SpaceX Falcon -- "... to meet requirements for approaching the ISS" • Generally useful, but cost prohibitive.

  16. Failure Models • Until now we have considered fail-stop processes. • When failed: stop sending messages and take no steps. • Byzantine faults: when failed do "arbitrary things." • These arbitrary things could even be coordinated with other failed nodes.

  17. What Can we Do?

  18. What Do We Care about Addressing 0 State 1 2 3 4 State State State State

  19. What Do We Care about Addressing 0 State 1 2 3 4 State State State State Can't really peer into the state of a remote node, cannot do much.

  20. What Do We Care about Addressing 0 1 2 3 4 Failed nodes can only interfere by sending messages.

  21. What Do We Care about Addressing 0 1 2 3 4 Make sure messages sent by all nodes are "correct" before acting.

  22. Why challenging? Don't know failed nodes a-priori.

  23. When are Messages Correct? • Every correct node receives the same messages (and acts correctly). • Same might not necessarily mean "correct". • But always accept any message from a correct participant. • Every message is "consistent" with the protocol. • Attach some kind of proof that you were supposed to send this message.

  24. When are Messages Correct? • Every correct node receives the same messages (and acts correctly). • Every message is "consistent" with the protocol.

  25. Agreeing on Correct Messages

  26. Problem we Want to Solve 0 AppendEntries(..., [(index=4)]) 1 2 3 4

  27. Problem we Want to Solve 0 Success 1 2 3 4

  28. Problem we Want to Solve 0 AppendEntries(..., [], leaderCommit = 4) 1 2 3 4

  29. Problem we Want to Solve 0 AppendEntries(..., [(index=4)]) 1 2 3 4

  30. Problem we Want to Solve 0 AppendEntries(..., [], leaderCommit = 4) 1 2 3 4

  31. Problem we Want to Solve • Cannot observe messages between individuals. • Hard to judge whether behavior is correct. • New idea: send messages to everyone. • Everyone knows where the state machine should be.

  32. Sending to Everyone 0 0->1: AppendEntries(..., [(index=4)]) 1 2 3 4

  33. Sending to Everyone 0 Success 1 2 3 4 Success

  34. Sending to Everyone is Insu ffi cient 0 0 0->1: AppendEntries(..., [(c1, index=4)]) 0->1: AppendEntries(..., [(c0, index=4)]) 1 2 3 4

  35. Sending to Everyone is Insu ffi cient 0 0 1 thinks slot 4 1 thinks slot 4 is c1 is c0 Success 1 2 3 4 Slot 4 is c0 Success

  36. Sending to Everyone is Not Su ffi cient • Faulty node can send differing messages to "everyone". • Run some protocol to detect this problem.

  37. Sending to Everyone 0 0 0->1: AppendEntries(..., [(c1, index=4)]) 0->1: AppendEntries(..., [(c0, index=4)]) 1 2 3 4 0 0->1: c0, 4 0 0->1: c1, 4 0 0->1: c0, 4 0 0->1: c1, 4 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4

  38. Sending to Everyone 0 0 1 2 3 4 0 0->1: c0, 4 0 0->1: c1, 4 0 0->1: c0, 4 0 0->1: c1, 4 1 1 1 1 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4 2 2 2 2 3 3 3 3 4 4 4 4

  39. Sending to Everyone 0 0 Choose majority, 1 2 3 4 breaking ties deterministically. 0 0->1: c0, 4 0 0->1: c1, 4 0 0->1: c0, 4 0 0->1: c1, 4 1 1 1 1 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4 2 2 2 2 0->1: c1, 4 0->1: c1, 4 0->1: c1, 4 3 3 3 3 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4 4 4 4 4 0->1: c1, 4 0->1: c1, 4 0->1: c1, 4

  40. Sending to Everyone 0 Choose majority, 1 2 2 3 4 breaking ties deterministically. 0 0->1: c0, 4 0 0->1: c0, 4 0 0->1: c0, 4 0 0->1: c0, 4 1 1 1 1 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4 2 2 2 2 ??? ??? ??? 3 3 3 3 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4 4 4 4 4 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4

  41. Not Possible for 1 failure with 3 participants 0 0 0->1: x=1 0->1: x=1 0->1: x=1 0->1: x=2 2 1 1 2

  42. Not Possible for 1 failure with 3 participants 0 0 0->1: x=2 0->1: x=2 2 1 1 2 0->1: x=1 0->1: x=1

  43. Not Possible for 1 failure with 3 participants 0 0 0->1: x=2 0->1: x=2 2 1 1 2 0->1: x=1 0->1: x=1 Cannot distinguish between these two cases. Cannot meet the two requirements state at the beginning.

  44. Limitations • More generally cannot solve for m failures with < 3m+1 participants. • Proof by reduction to the case with 3.

  45. Sending to Everyone 0 0 1 2 3 4 5 6 0 0->1: c0, 4 0 0->1: c0, 4 0 0->1: c1, 4 0 0->1: c0, 4 0 0->1: c1, 4 0 0->1: c1, 4 1 1 1 1 1 1 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4 2 2 2 2 2 2 0->1: c1, 4 0->1: c1, 4 0->1: c1, 4 0->1: c1, 4 0->1: c1, 4 3 3 3 3 3 3 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4 4 4 4 4 4 4 0->1: c1, 4 0->1: c1, 4 0->1: c1, 4 0->1: c1, 4 0->1: c1, 4 5 5 5 5 5 0->1: c1, 4 0->1: c1, 4 5 0->1: c1, 4 0->1: c1, 4 0->1: c1, 4 6 6 6 6 6 0->1: c0, 4 0->1: c0, 4 6 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4 • However, note that doing this once is not sufficient for more than 1 faults.

  46. Sending to Everyone 0 0 1 2 2 3 4 5 6 0 0->1: c0, 4 0 0->1: c0, 4 0 0->1: c1, 4 0 0->1: c0, 4 0 0->1: c1, 4 0 0->1: c1, 4 1 1 1 1 1 1 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4 2 2 2 2 2 2 ??? ??? ??? ??? ??? 3 3 3 3 3 3 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4 4 4 4 4 4 4 0->1: c1, 4 0->1: c1, 4 0->1: c1, 4 0->1: c1, 4 0->1: c1, 4 5 5 5 5 5 0->1: c1, 4 0->1: c1, 4 5 0->1: c1, 4 0->1: c1, 4 0->1: c1, 4 6 6 6 6 6 0->1: c0, 4 0->1: c0, 4 6 0->1: c0, 4 0->1: c0, 4 0->1: c0, 4 • However, note that doing this once is not sufficient for more than 1 faults. • For example, can force any decision in this case.

  47. Solution: Recursively call again.

  48. When are Messages Correct? • Every correct node receives the same messages (and acts correctly). • Every message is "consistent" with the protocol.

  49. Proving Consistency with the Protocol

  50. What Does this Even Mean? 0 AppendEntries(..., [(index=4)]) 1 2 3 4

  51. What Does this Even Mean? 0 Success 1 2 3 4

  52. What Does this Even Mean? 0 AppendEntries(..., [], leaderCommit = 4), Proof that a majority have accepted entires until 4. 1 2 3 4

  53. Problem • How to generate proofs? • Many possibilities, but just going to include messages here. • How to prevent failed nodes from misrepresenting messages?

  54. Misrepresenting Messages 0 AppendEntries(..., [], leaderCommit = 4), Success from 0, 1, 2, 3 1 2 3 4

  55. Misrepresenting Messages 0 0 AppendEntries(..., [], leaderCommit = 4), Success from 0, 1, 2, 3 1 2 3 4

  56. Warning: Cryptography

  57. Digests/Hashes Arbitrary length input h Fixed length output • Deterministic: h(x) should always be the same value. • Not invertable -- given h(x) cannot find x. • Output of h(x) is equivalent to a random function. • Infeasible to find collisions.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend