Byzantine Fault Tolerant Systems Stefan Heinz Advanced Topics in - - PowerPoint PPT Presentation

byzantine fault tolerant systems
SMART_READER_LITE
LIVE PREVIEW

Byzantine Fault Tolerant Systems Stefan Heinz Advanced Topics in - - PowerPoint PPT Presentation

Byzantine Fault Tolerant Systems Stefan Heinz Advanced Topics in Distributed Computing Ph. D. Petr Kuznetsov WS 07/08 Stefan Heinz, WS 07/08 Farsite Federated, Available and Reliable Storage for an Incompletely Trusted Environment Adya et


slide-1
SLIDE 1

Stefan Heinz, WS 07/08

Byzantine Fault Tolerant Systems

Stefan Heinz Advanced Topics in Distributed Computing

  • Ph. D. Petr Kuznetsov

WS 07/08

slide-2
SLIDE 2

Stefan Heinz, WS 07/08

Farsite

Federated, Available and Reliable Storage for an Incompletely Trusted Environment Adya et al., 2002

slide-3
SLIDE 3

Stefan Heinz, WS 07/08

Outline

 Motivation / Introduction  System Overview  System Architecture  File System Features  Summary / Conclusion

slide-4
SLIDE 4

Stefan Heinz, WS 07/08

Outline

 Motivation / Introduction  System Overview  System Architecture  File System Features  Summary / Conclusion

slide-5
SLIDE 5

Stefan Heinz, WS 07/08

Farsite Motivation / Introduction

Server-based file-system Farsite What are the disadvantages

  • f this architecture?

Key techniques: BFT, replication, cryptography

slide-6
SLIDE 6

Stefan Heinz, WS 07/08

Farsite Motivation / Introduction

Want to achieve the benefits of a central file server

A shared namespace

Location-transparent access

Reliable data storage and the benefits of a local desktop file systems

Low cost

Privacy from nosy sysadmins

Resistance to geographically localized faults

slide-7
SLIDE 7

Stefan Heinz, WS 07/08

Key design objectives

Emulation of a local NTFS file system

Scalability

Provide the benefits of BFT

Minimal administrative effort

Farsite Motivation / Introduction

slide-8
SLIDE 8

Stefan Heinz, WS 07/08

Outline

 Motivation / Introduction  System Overview  System Architecture  File System Features  Summary / Conclusion

slide-9
SLIDE 9

Stefan Heinz, WS 07/08

Farsite System Overview – Design Assumptions

High-bandwidth, low-latency network

Majority of machines are up for the majority of the time

Incorrelated machine downtimes

Independent permanent machine failures

Each machine performs correctly for its immediate user

slide-10
SLIDE 10

Stefan Heinz, WS 07/08

Farsite System Overview – Namespace Roots

Hierarchical directory namespace

Farsite supports multiple roots (like names of file servers)

Each root is managed by a set of machines, which form a BFT group

rootA rootB SubDirA SubDirB

slide-11
SLIDE 11

Stefan Heinz, WS 07/08

Each user private key is encrypted with a symmetric key derived from the user’s password and then stored in a globally-readable directory

Usage of certificate revocation lists

Farsite System Overview – Trust and Certification

Certification Authority namespace certificate user certificate machine certificate

slide-12
SLIDE 12

Stefan Heinz, WS 07/08

Outline

 Motivation / Introduction  System Overview  System Architecture  File System Features  Summary / Conclusion

slide-13
SLIDE 13

Stefan Heinz, WS 07/08

Farsite System Architecture – Basic System

Every machine may perform three roles

 client  member of a directory group  file host 

A directory group collectively manages file information using a BFT protocol

The BFT protocol guarantees data consistency as long as fewer than a third of the machines misbehave

slide-14
SLIDE 14

Stefan Heinz, WS 07/08

Farsite System Architecture – Basic System

directory group metadata filedata clients

slide-15
SLIDE 15

Stefan Heinz, WS 07/08

Farsite System Architecture – Enhancements

directory group metadata hashes filedata file hosts & clients BFT replication raw replication

Scalability How many machines may die until data is lost?

slide-16
SLIDE 16

Stefan Heinz, WS 07/08

Farsite System Architecture – Enhancements

Performance

Usage of local caching and file leases

Updates not pushed directly to the directory group, because most file writes are deleted or overwritten shortly after they occur

slide-17
SLIDE 17

Stefan Heinz, WS 07/08

Farsite System Architecture – Enhancements

Security

Clients encrypt written file data with the public keys of all authorized readers

Directory group cryptographically validates requests from users before accepting updates Reliability

When a machine is unavailable for an extended period of time, its functions migrate to one or more other machines

Data is lost permanently only if too many machines fail within too small a time window to permit regeneration

slide-18
SLIDE 18

Stefan Heinz, WS 07/08

Outline

 Motivation / Introduction  System Overview  System Architecture  File System Features  Summary / Conclusion

slide-19
SLIDE 19

Stefan Heinz, WS 07/08

Farsite File System Features - Security

generate one-way hash of each block encrypt the blocks using the hashes as keys use a randomly generated file key to encrypt the hashes and encrypt this key with the public keys of authorized readers block encyryption allows for:

  • writing individual blocks
  • read individual blocks without

the need to load the entire file Benefits: encryptions are comparable, e.g. to identify duplicated files

Convergent Encryption

slide-20
SLIDE 20

Stefan Heinz, WS 07/08

Farsite File System Features - Scalability

Delegation of parts of the namespace

Hint based pathname translation Clients cache pathnames and their mappings to directory groups

Delayed directory-change notification

 Clients register for a notification when a user lists a directory  The directory group packages the information, signs it and

sends it to the registered clients

slide-21
SLIDE 21

Stefan Heinz, WS 07/08

Outline

 Motivation / Introduction  System Overview  System Architecture  File System Features  Summary / Conclusion

slide-22
SLIDE 22

Stefan Heinz, WS 07/08

Farsite is a scalable, decentralized network file system which uses insecure and unreliable machines as a basis for a virtual file server that is secure and reliable

To achieve this it uses a lot of known techniques: replication, BFT, cryptography, certificates, leases, caching

It also introduces new techniques:

 convergent encryption  timed byzantine operations

Farsite Summary / Conclusion

directory group metadata hashes filedata file hosts BFT replication raw replication

slide-23
SLIDE 23

Stefan Heinz, WS 07/08

Performance Measurements

Performance conclusion: Farsite performs significantly better than remote file access via CIFS

Farsite Summary / Conclusion

”For our performance evaluation, we configured a five- machine Farsite system [...]. Four machines served as file hosts and as members of a directory group, and

  • ne machine served as a client.”
slide-24
SLIDE 24

Stefan Heinz, WS 07/08

Zyzzyva

Speculative Byzantine Fault Tolerance Kotla et al., 2007

slide-25
SLIDE 25

Stefan Heinz, WS 07/08

Outline

 Motivation / Introduction  Protocol  Agreement Protocol  View Changes  Correctness  Summary / Conclusion

slide-26
SLIDE 26

Stefan Heinz, WS 07/08

Outline

 Motivation / Introduction  Protocol  Agreement Protocol  View Changes  Correctness  Summary / Conclusion

slide-27
SLIDE 27

Stefan Heinz, WS 07/08

Zyzzyva: Speculative BFT Motivation / Introduction

Goal of BFT protocols: Transform a high-performance service into a high-performance and reliable service

slide-28
SLIDE 28

Stefan Heinz, WS 07/08

Why another BFT protocol?

A lot of different BFT protocols exist, which perform differently in different situations, e.g under different workload

Such complexity represents a barrier to adoption of BFT techniques because it requires to choose the right technique for a workload which then should not deviate from expectations Outperform other BFT protocols

Zyzzyva: Speculative BFT Motivation / Introduction

BFT? Zyzzyva Yes

slide-29
SLIDE 29

Stefan Heinz, WS 07/08

One replica is selected as a primary

The primary proposes an order on client requests to the other replicas

Unlike in other protocols the replicas speculatively execute requests without running an expensive agreement protocol

Replicas states may diverge, but clients help to detect and correct inconsistencies

The replies of the replicas carry sufficient history information for clients to determine if the replies and history are stable and guaranteed to be eventually committed

Zyzzyva: Speculative BFT Motivation / Introduction

slide-30
SLIDE 30

Stefan Heinz, WS 07/08

Traditional BFT state machine replication

Zyzzyva: Speculative BFT Motivation / Introduction

Client Primary Replica 1 Replica 2 Replica 3 request reply Agreement Execution

Cost: Agreement protocol overhead

slide-31
SLIDE 31

Stefan Heinz, WS 07/08

Zyzzyva: Speculative BFT Replication

Client Primary Replica 1 Replica 2 Replica 3 request reply Speculative execution

Zyzzyva: Speculative BFT Motivation / Introduction

Cost: No explicit replica agreement

slide-32
SLIDE 32

Stefan Heinz, WS 07/08

Outline

 Motivation / Introduction  Protocol  Agreement Protocol  View Changes  Correctness  Summary / Conclusion

slide-33
SLIDE 33

Stefan Heinz, WS 07/08

Clients should act only upon replies that correspond to stable requests executed in a total order that is guaranteed to eventually commit at all correct servers

The request has afterwards the same sequence number at all correct replicas and the same history of preceding requests

  • bserved by the client

Zyzzyva: Speculative BFT Protocol - Agreement Protocol

slide-34
SLIDE 34

Stefan Heinz, WS 07/08

Client sends request to the primary

Zyzzyva: Speculative BFT Protocol - Agreement Protocol

m=< REQUEST ,o ,t ,c>c

client primary

slide-35
SLIDE 35

Stefan Heinz, WS 07/08

Primary receives request, assigns sequence number and forwards ordered request to replicas <OR ,m> OR=<ORDER−REQ ,v ,n,hn,d ,ND>p d=Hm,hn=H hn−1,d

Zyzzyva: Speculative BFT Protocol - Agreement Protocol

client primary

slide-36
SLIDE 36

Stefan Heinz, WS 07/08

Replica receives ordered request, speculatively executes it and responds to the client

client primary

<<SPEC−RESPONSE ,v ,n,hn, Hr,c ,t >i,i,r ,OR >

Zyzzyva: Speculative BFT Protocol - Agreement Protocol

slide-37
SLIDE 37

Stefan Heinz, WS 07/08

Client gathers speculative responses

client primary

Zyzzyva: Speculative BFT Protocol - Agreement Protocol

What cases can occur?

slide-38
SLIDE 38

Stefan Heinz, WS 07/08

Client gathers speculative responses

 Client receives 3f+1 matching responses and completes the

request

client primary

Zyzzyva: Speculative BFT Protocol - Agreement Protocol

slide-39
SLIDE 39

Stefan Heinz, WS 07/08

Client gathers speculative responses

 Client receives between 2f+1 and 3f matching responses,

assembles a commit certificate and transmits it to the replicas

client primary

Zyzzyva: Speculative BFT Protocol - Agreement Protocol

<COMMIT ,c ,CC >c

slide-40
SLIDE 40

Stefan Heinz, WS 07/08

Client gathers speculative responses

 Client sends commit certificate  Replicas acknowledge with a LOCAL-COMMIT message  Client receives 2f+1 LOCAL-COMMIT messages and

completes the request

client primary

Zyzzyva: Speculative BFT Protocol - Agreement Protocol

slide-41
SLIDE 41

Stefan Heinz, WS 07/08

Client gathers speculative responses

 Client doesn't gather 2f+1 matching SPEC-RESPONSE or

LOCAL-COMMIT messages and resends its request to all replicas, which forward the request to the primary

client primary

Zyzzyva: Speculative BFT Protocol - Agreement Protocol

<CONFIRM−REQ ,v ,m,i>i

slide-42
SLIDE 42

Stefan Heinz, WS 07/08

Client gathers speculative responses

 Client receives responses indicating inconsistent ordering by

the primary and sends a proof of misbehavior to the replicas

 Replicas initiate a view change to oust the faulty primary and

forwards the POM message to all other replicas

client primary

Zyzzyva: Speculative BFT Protocol - Agreement Protocol

< POM ,v , POM >c

slide-43
SLIDE 43

Stefan Heinz, WS 07/08

client primary f faulty replicas, one

  • f them the primary

Zyzzyva: Speculative BFT View Changes

< I−HATE−THE−PRIMARY ,v > i

If a replica receives f+1 votes of no confidence in the primary, then it commits to a view change (becomes silent) and multicasts to all replicas a proof that f+1 replicas have no confidence

slide-44
SLIDE 44

Stefan Heinz, WS 07/08

View change completes, when the new primary, using 2f+1 VIEW-CHANGE messages from distinct replicas, computes the history of requests that all correct replicas must adopt to enter the new view

Primary sends NEW-VIEW message to all replicas including this history and a proof of validity

Zyzzyva: Speculative BFT View Changes

slide-45
SLIDE 45

Stefan Heinz, WS 07/08

Outline

 Motivation / Introduction  Protocol  Agreement Protocol  View Changes  Correctness  Summary / Conclusion

slide-46
SLIDE 46

Stefan Heinz, WS 07/08

Zyzzyva ensures the following conditions

Safety If a request with sequence number n and history completes, then any request that completes with a higher sequence number has a history that includes as a prefix

Liveness Any request issued by a correct client eventually completes hn n'≥n hn' hn

Zyzzyva: Speculative BFT Correctness

slide-47
SLIDE 47

Stefan Heinz, WS 07/08

Zyzzyva: Speculative BFT Correctness - Safety

Within a view

A request completes when a client receives 3f+1 matching SPEC-RESPONSE messages in phase 1 or 2f+1 matching LOCAL-COMMIT messages in phase 2

If a request completes with sequence number n then no

  • ther request can do this because correct replicas

 send only SPEC-RESPONSE messages for a given

sequence number

 send only one LOCAL-COMMIT after seeing 2f+1 matching

SPEC-RESPONSE messages

slide-48
SLIDE 48

Stefan Heinz, WS 07/08

Zyzzyva: Speculative BFT Correctness - Safety

Within a view

For any two requests r and r' that complete with sequence numbers n and n' there are at least 2f+1 replicas that ordered each request

Because there are only 3f+1 replicas in total, at least one correct replica ordered both

Therefore if n < n' the history is a prefix of the history

hn

hn'

slide-49
SLIDE 49

Stefan Heinz, WS 07/08

Across views

If a request r completes with 2f+1 matching LOCAL-COMMITs, then at least f+1 correct replicas have received a CC for r and will send this to the new primary in the VIEW-CHANGE message, so r will be included in the ”new” history

If a request r completes with 3f+1 SPEC-RESPONSEs, then every correct replica will include the ORDER-REQ for r in its VIEW-CHANGE message => r will be supported by at least f+1 replicas in the set of 2f+1 VIEW-CHANGEs collected by the new primary

Zyzzyva: Speculative BFT Correctness - Safety

slide-50
SLIDE 50

Stefan Heinz, WS 07/08

Primary and client correct => request completes

Client receives 3f+1 matching responses => request completes

Client receives < 3f+1 matching reponses

 Client receives at least 2f+1, since at most f of the 3f+1

replicas are faulty

 Client sends a COMMIT to all replicas, all correct replicas

send a LOCAL-COMMIT to the client

 Client gets at least 2f+1 LOCAL-COMMITs, therefore the

request completes

Zyzzyva: Speculative BFT Correctness - Liveness

slide-51
SLIDE 51

Stefan Heinz, WS 07/08

Request from correct client does not complete in current view => view change occurs

If the request doesn't complete, the clients sends it to all replicas

Every correct replica contacts the primary (ORDER-REQ)

Every correct replica that doesn't get an answer initiates a view change by sending I-HATE-THE-PRIMARY

Zyzzyva: Speculative BFT Correctness - Liveness

slide-52
SLIDE 52

Stefan Heinz, WS 07/08

Request from correct client does not complete in current view => view change occurs

If a correct replica receives f+1 I-HATE-THE-PRIMARYs, then the replica commit to a view change

If no correct replica receives f+1 I-HATE-THE-PRIMARYs, then all correct replica that did not receive an answer from the primary get it from another replica

Zyzzyva: Speculative BFT Correctness - Liveness

slide-53
SLIDE 53

Stefan Heinz, WS 07/08

Request from correct client does not complete in current view => view change occurs

No correct replica receives f+1 I-HATE-THE-PRIMARYs

 Therefore the client receives at least 2f+1 SPEC-RESP.s  The client receives fewer then 2f+1 matching responses

(otherwise the client could form a COMMIT and complete)

 Then the client can form a POM and send it to the replicas to

initiate a view change

Zyzzyva: Speculative BFT Correctness - Liveness

slide-54
SLIDE 54

Stefan Heinz, WS 07/08

Outline

 Motivation / Introduction  Protocol  Agreement Protocol  View Changes  Correctness  Summary / Conclusion

slide-55
SLIDE 55

Stefan Heinz, WS 07/08

Zyzzyva: Speculative BFT Summary / Conclusion

BFT protocol that uses speculation to reduce the cost

Replicas can become temporarily inconsistent with one another, but clients help to detect and correct this inconsistencies

Clients act only upon replies that correspond to stable requests executed in a total order that is guaranteed to eventually commit at all correct servers

slide-56
SLIDE 56

Stefan Heinz, WS 07/08

Zyzzyva: Speculative BFT Summary / Conclusion

Optimal 3f+1 3f+1 2f+1 2f+1 2 3 3 Zyzzyva Replication cost Total replicas Replication cost

  • App. replicas

Throughput Overhead: Crypto ops 2+3f/b Latency Message delays

slide-57
SLIDE 57

Stefan Heinz, WS 07/08

Thank you for your attention!