outline
play

Outline Overview Byzantine-Altruistic-Rational (BAR) model System - PDF document

4/13/2008 Amitanand S. Aiyer, Lorenzo Alvisi, Allen Clement, Mike Dahlin, Jean-Philippe Martin, Carl Porth Award Paper in the 20 th ACM Symposium on Operating Systems Principles (SOSP 2005) . Presented to: Dr. Ayman Abdel-Hamid By: Shaimaa


  1. 4/13/2008 Amitanand S. Aiyer, Lorenzo Alvisi, Allen Clement, Mike Dahlin, Jean-Philippe Martin, Carl Porth Award Paper in the 20 th ACM Symposium on Operating Systems Principles (SOSP 2005) . Presented to: Dr. Ayman Abdel-Hamid By: Shaimaa Lazem Outline � Overview � Byzantine-Altruistic-Rational (BAR) model � System Architecture � Principles of Operations � Level 1: BART State Machine � Level 2: Partitioning Work � Level 3: The Application � BAR-B 4/14/2008 1

  2. 4/13/2008 Overview � Cooperative service in Multiple Administrative Domains (MAD): � Nodes collaborate to provide some service that benefits each node, but there is no central authority that controls the nodes’ actions (Internet routing, cooperative backup). � Problem � Nodes may depart from protocols . Failure, broken, security compromise, selfish nodes. � Not sufficient to verify experimentally that a protocol tolerates a collection of attacks identified by the protocol’s creator. � It is necessary to design protocols that provably meet their goals, no matter what strategies nodes may concoct . 4/14/2008 Contributions � Formal model for reasoning about systems in the presence of nodes’ deviated behavior (BAR Model). � General architecture and a set of design principles which, together, make it possible to build and reason about BAR tolerant systems. � The implementation of BAR-B, a cooperative backup system within the BAR model. 4/14/2008 2

  3. 4/13/2008 Byzantine-Altruistic-Rational (BAR) model � Three classes of nodes: � Rational nodes participate in the system to gain some net benefit and can depart from a proposed program in order to increase their net benefit. � Byzantine nodes can depart arbitrarily from a proposed program whether it benefits them or not. � A ltruistic nodes that execute a proposed program even if the rational choice is to deviate. 4/14/2008 Byzantine-Altruistic-Rational (BAR) model (cont.) � Two classes of protocols : � Incentive-Compatible Byzantine Fault Tolerant (IC-BFT) A protocol is IC-BFT if it guarantees the specified set of safety and liveness properties and if it is in the best interest of all rational nodes to follow the protocol exactly. � Byzantine Altruistic Rational Tolerant (BART) A protocol is BART if it guarantees the specified set of safety and liveness properties in the presence of all rational deviations from the protocol. 4/14/2008 3

  4. 4/13/2008 Replicated State Machine (RSM) � Technique for supporting service replication. � The service is written as a deterministic state machine; replicated on several machines. � An RSM substrate coordinates the behavior of the separate state machines so that their executions proceed consistently, even if some of the computers fail. � A key task of the RSM substrate is to establish a task ordering. 4/14/2008 Replicated State Machine (RSM) (cont.) A typical RSM-based client-server computer system [2]. 4/14/2008 4

  5. 4/13/2008 Replicated State Machine (RSM) (cont.) RSM timing diagram [2]. 4/14/2008 System Model Assumptions � BART protocols that do not depend on the existence of altruistic nodes in the system. � Trusted authority controls which nodes may enter the system. � Each member has a unique identity corresponding to a cryptographic public key. � Nodes have an incentive to stay as synchronized as possible through a “penance” mechanism. 4/14/2008 5

  6. 4/13/2008 System Model Assumptions (cont.) � Rational Nodes: � Receive a long term benefit from participating in the protocol. � Conservative when computing the impact of Byzantine nodes on their utility. � Colluding nodes are classified as Byzantine. � Byzantine Nodes: � Exhibit arbitrary behavior. crash, lose data, alter data, and send incorrect protocol messages. � At most ((n-2)/3) of the nodes in the system are Byzantine. � Every non-Byzantine node is rational. 4/14/2008 System Architecture � Level 1, key abstractions for reliable distributed services. � RSM gives the abstraction of a correct (reliable and altruistic) node. � Level 2, build a system in which work can be assigned to specific nodes instead of executed by all replicas in the RSM. � Level 3, implements a desired service using the levels underneath. 4/14/2008 6

  7. 4/13/2008 Principles of Operations � Accountability, nodes are accountable for their behavior, then rational peers have an incentive to behave correctly. � Strong identities and restricted membership are parts of the solution. � How should a system detect and react to incorrect behavior? � Aggressively Byzantine node, easy to address: � A node signs a promise to store a file with a particular cryptographic hash and then responds to a request to read the file with a signed message that contains the wrong data. 4/14/2008 Principles of Operations (cont.) � Passive aggressively node: � A node may decline to send a message that it should send. The receiver is in a position to accuse the node of wrongdoing, but it becomes a case of “he said/she said”. � A node may exploit non-determinism to provide incomplete information that interfere with the protocol’s operation but are difficult to conclusively prove wrong. � A node transmits a signed copy of the request, but for liveness it is permitted to transmit a signed timeout message instead. � Self-interested nodes may choose to send the timeout message rather than transmit the request. 4/14/2008 7

  8. 4/13/2008 Principles of Operations- Addressing the challenges � Level 1 (primitives) � Nodes unilaterally deny service to nodes that fail to send expected messages. This low-level, local tit-for-tat technique provides incentives for cooperation without requiring a third party to judge which node is to blame. � The protocol balances costs so that when nodes have a choice between two messages, there is no incentive to choose the “wrong” one. � Nodes can unilaterally impose extra work (called penance) when they judge that another node’s response is not timely. 4/14/2008 Principles of Operations - Addressing the challenges (cont.) � Level 2 (work assignment) � If a node fails to reply to a request issued via the underlying state machine, then a quorum of nodes in the state machine generates a proof of misbehavior (POM) against the node. � Level 3 (application) � Applications make use of reliable work assignment, each request is bound to a reply or timeout. � The application protocol must be designed so that requests and responses include sufficient information for any node to judge the validity of a request/response pair. 4/14/2008 8

  9. 4/13/2008 Level 1: BART State Machine � Terminating Reliable Broadcast (TRB) � Each TRB instance is organized in a series of turns. � The sender for instance i is the first leader for instance i . � If nodes receive the messages on time they accept the value, otherwise nodes send a “set-turn” message. � Nodes other than the sender are selected round-robin for the leader role. � Each participant thus has a periodic opportunity to propose values to the state machine (ensure long term benefit). � An instance can terminate only in two ways to limit non- determinism (sender’s value, default value) 4/14/2008 Level 1: BART State Machine(cont.) 4/14/2008 9

  10. 4/13/2008 Level 1: BART State Machine(cont.) � Message Queue � The message queue used by x contains entries for the messages that x intends to send to y, interleaved with “bubbles”. � A bubble must be filled with an appropriate message from y before x can proceed to send the messages in the queue. � Incentive for rational nodes to send messages expected by protocol. � Balanced Messages: � Whenever the node has the opportunity to choose the message to send next, the intended message is never more expensive than the alternatives. 4/14/2008 Level 1: BART State Machine(cont.) � Penance � Each node maintains an untimely vector that tracks their perception of other nodes timeliness. � A node is considered untimely if any timeout message electing a new leader arrives significantly earlier or later than expected according to the receiver’s local clock. � When a node x becomes the sender, it includes its untimely vector with the value it proposes. � After agreeing on the proposal, all nodes except the sender expect a penance message from each node indicted in the untimely vector. � Because of the message queues, the untimely nodes must send the penance message to all non-sender nodes. 4/14/2008 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend