byzan ne fault tolerance
play

Byzan&ne Fault Tolerance CS 425: Distributed Systems Fall 2011 - PowerPoint PPT Presentation

Byzan&ne Fault Tolerance CS 425: Distributed Systems Fall 2011 Material drived from slides by I. Gupta and N.Vaidya 1 Reading List L. Lamport, R. Shostak, M. Pease, The Byzan&ne Generals Problem, ACM ToPLaS 1982. M.


  1. Byzan&ne Fault Tolerance CS 425: Distributed Systems Fall 2011 Material drived from slides by I. Gupta and N.Vaidya 1

  2. Reading List • L. Lamport, R. Shostak, M. Pease, “The Byzan&ne Generals Problem,” ACM ToPLaS 1982. • M. Castro and B. Liskov, “Prac&cal Byzan&ne Fault Tolerance,” OSDI 1999. 2

  3. Byzan&ne Generals Problem A sender wants to send message to n‐1 other peers • Fault‐free nodes must agree • Sender fault‐free  agree on its message • Up to f failures

  4. Byzan&ne Generals Problem A sender wants to send message to n‐1 other peers • Fault‐free nodes must agree • Sender fault‐free  agree on its message • Up to f failures

  5. Byzan&ne Generals Algorithm value v S v v v 3 Faulty peer 1 2 5

  6. Byzan&ne Generals Algorithm value v S v v v v v 3 1 2 6

  7. Byzan&ne Generals Algorithm value v S v v v v v 3 1 2 ? ? 7

  8. Byzan&ne Generals Algorithm value v S v v v v v 3 1 2 v ? ? v 8

  9. Byzan&ne Generals Algorithm value v S v v v v v [v,v,?] 3 1 2 v ? [v,v,?] ? x 9

  10. Byzan&ne Generals Algorithm value v S v v v v v v 3 1 2 Majority v ? vote results v in correct result at ? good peers x 10

  11. Byzan&ne Generals Algorithm S Faulty source v x w 3 1 2 11

  12. Byzan&ne Generals Algorithm S v x w w w 3 1 2 12

  13. Byzan&ne Generals Algorithm S v x w w w 3 1 2 x v v x 13

  14. Byzan&ne Generals Algorithm S v x w w w [v,w,x] 3 [v,w,x] 1 2 x v [v,w,x] v x 14

  15. Byzan&ne Generals Algorithm S v x w w w [v,w,x] 3 [v,w,x] 1 2 x v [v,w,x] Vote result v iden&cal at good peers x 15

  16. Known Results • Need 3f + 1 nodes to tolerate f failures • Need Ω(n 2 ) messages in general 16

  17. Ω(n 2 ) Message Complexity • Each message at least 1 bit • Ω(n 2 ) bits “ communica&on complexity ” to agree on just 1 bit value 17

  18. Prac&cal Byzan&ne Fault Tolerance • Computer systems provide crucial services • Computer systems fail – Crash‐stop failure – Crash‐recovery failure – Byzan&ne failure • Example: natural disaster, malicious afack, hardware failure, sogware bug, etc. • Need highly available service Replicate to increase availability 18

  19. Challenges Request A Request B Client Client 19

  20. Requirements • All replicas must handle same requests despite failure. • Replicas must handle requests in iden&cal order despite failure. 20

  21. Challenges Client Client 1: Request A 2: Request B 21

  22. State Machine Replica&on Client Client How to assign sequence number to requests? 1: Request A 1: Request A 1: Request A 1: Request A 2: Request B 2: Request B 2: Request B 2: Request B 22

  23. Primary Backup Mechanism Client Client What if the primary is faulty? Agreeing on sequence number Agreeing on changing the primary (view change) 1: Request A 2: Request B View 0 23

  24. Normal Case Opera&on • Three phase algorithm: – PRE‐PREPARE picks order of requests – PREPARE ensures order within views – COMMIT ensures order across views • Replicas remember messages in log • Messages are authen&cated – {.} σk denotes a message sent by k 24

  25. Pre‐prepare Phase Request: m {PRE‐PREPARE, v, n, m} σ0 Primary: Replica 0 Replica 1 Replica 2 Fail Replica 3 25

  26. Prepare Phase Request: m PRE‐PREPARE Primary: Replica 0 Replica 1 Replica 2 Fail Replica 3 Accepted PRE‐PREPARE 26

  27. Prepare Phase Request: m PRE‐PREPARE Primary: Replica 0 {PREPARE, v, n, D(m), 1} σ1 Replica 1 Replica 2 Fail Replica 3 Accepted PRE‐PREPARE 27

  28. Prepare Phase Request: m Collect PRE‐PREPARE + 2f matching PREPARE PRE‐PREPARE Primary: Replica 0 {PREPARE, v, n, D(m), 1} σ1 Replica 1 Replica 2 Fail Replica 3 Accepted PRE‐PREPARE 28

  29. Commit Phase Request: m PRE‐PREPARE PREPARE Primary: Replica 0 Replica 1 {COMMIT, v, n, D(m)} σ2 Replica 2 Fail Replica 3 29

  30. Commit Phase (2) Request: m Collect 2f+1 matching COMMIT: execute and reply PRE‐PREPARE PREPARE COMMIT Primary: Replica 0 Replica 1 Replica 2 Fail Replica 3 30

  31. View Change • Provide liveness when primary fails – Timeouts trigger view changes – Select new primary (= view number mod 3f+1) • Brief protocol – Replicas send VIEW‐CHANGE message along with the requests they prepared so far – New primary collects 2f+1 VIEW‐CHANGE messages – Constructs informa&on about commifed requests in previous views 31

  32. View Change Safety • Goal: No two different commifed request with same sequence number across views Quorum for Commifed View Change Cer&ficate (m, v, n) Quorum At least one correct replica has Prepared Cer&ficate (m, v, n) 32

  33. Related Works Fault Tolerance Fail Stop Fault Tolerance Byzan&ne Fault Tolerance Paxos Byzan&ne Byzan&ne Hybrid 1989 (TR) Agreement Quorums Quorum VS Replica&on Rampart Malkhi‐Reiter HQ Replica&on PODC 1988 TPDS 1995 JDC 1998 OSDI ‘06 SecureRing Phalanx HICSS 1998 SRDS 1998 PBFT Fleet OSDI ‘99 ToKDI ‘00 BASE Q/U TOCS ‘03 SOSP ‘05 33

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend