computing in a distributed system in the presence of
play

Computing in a Distributed System in the Presence of Benign Failures - PowerPoint PPT Presentation

Computing in a Distributed System in the Presence of Benign Failures Bernadette CHARRON-BOST, CNRS (joint work with Andr e SCHIPER , EPFL) Distributed System medium of communication computational unit No universal computational model for


  1. Computing in a Distributed System in the Presence of Benign Failures Bernadette CHARRON-BOST, CNRS (joint work with Andr´ e SCHIPER , EPFL)

  2. Distributed System medium of communication computational unit No universal computational model for distributed systems

  3. Two Basic Principles • The model must specify why faults occur Causes of two different natures: • Degree of synchronism • Failure model

  4. Two Basic Principles • The model must specify why faults occur Causes of two different natures: • Degree of synchronism • Failure model

  5. Two Basic Principles • The model must specify why faults occur • The model must specify by whom (culprit) faults occur

  6. Two Basic Principles • The model must specify why faults occur • The model must specify by whom faults occur The notion of faulty component is necessary and useful for the analysis of distributed computations

  7. First Principle bounded delays ( synchronous ) finite delays ( asynchronous ) arbitrary delays ( failure ) . . . breaks the natural continuum from bounded to infinite delays !

  8. A classical type of systems Synchronous system + crash failures

  9. A classical type of systems Synchronous system + crash failures • transmission delays bounded • process speeds bounded or infinite

  10. First Principle • breaks the natural continuum from bounded to infinite delays • synchronism degree and failure model are not independant

  11. Second Principle • may lead to undesirable conclusions Only one transmission fault from each node each process is considered faulty Send omission model � (no algorithm when the entire system is faulty)

  12. Second Principle • may lead to undesirable conclusions • faulty processes are allowed to have deviant behaviors “Every correct process eventually decides” One transmission failure for a message sent by p to q Send omission model: p is allowed to make no decision � Link failure model: p and q must make a decision � Receive omission model: q is allowed to make no decision �

  13. Second Principle • may lead to undesirable conclusions • faulty processes are allowed to have deviant behaviors • real causes of transmission failures are often unknown

  14. Second Principle • may lead to undesirable conclusions • faulty processes are allowed to have deviant behaviors • real causes of transmission failures are often unknown • no evidence that the notion of faulty component is helpful

  15. The Heard-Of Model We just specify transmission faults: we don’t consider anymore by whom nor why faults occur

  16. HO: a Round-Based Model p local sending phase receive phase computation (to all) round r At each round, every process sends messages to all allows us to distinguish semantic and operational � features of computations

  17. HO: a Round-Based Model p local sending phase receive phase computation (to all) round r If m is received at round r then m has been sent at round r � Rounds are communication-closed layers

  18. First Principle bounded delays ( synchronous ) arbitrary delays ( failure ) � late messages are discarded [Dwork, Lynch & Stockmeyer, 1988] and [Gafni, 1998]

  19. HO Process  Init p ⊆ States p States p ,  S p : ( s, q ) → m q µ ) → s ′ T p : ( s, �  s s ′ p round r At round r , process p receives messages from HO ( p, r ) supp( � µ ) = HO ( p, r )

  20. Second Principle Faults are specified but not the culprits � [Santoro & Widmayer 1989]

  21. HO Algorithm • Distributed algorithm on Π A = ( States p , Init p , S p , T p ) p ∈ Π • Run of algorithm A  ( s 0 with s 0 p ∈ Init p p ) p ∈ Π  ( HO ( p, r )) p ∈ Π ,r> 0 

  22. • Kernel of round r : � K ( r ) = HO ( p, r ) p ∈ Π • coKernel of round r : coK ( r ) = Π \ K ( r ) • Global kernel (of a run): � � K = HO ( p, r ) = K ( r ) r> 0 p ∈ Π ,r> 0 • Global coKernel (of a run): coK = Π \ K

  23. Communication Predicate Predicate over collections of heard-of sets P nosplit :: ∀ p, q, ∀ r : HO ( p, r ) ∩ HO ( q, r ) � = ∅ P sp unif :: ∀ p, q, ∀ r : HO ( p, r ) = HO ( q, r )

  24. Communication Predicate Predicate over collections of heard-of sets endogenous definition of the system properties � ( � = Failure Detector model )

  25. P f | K | ≥ n − f K :: P f ∀ p, ∀ r : | HO ( p, r ) | ≥ n − f HO :: P reg :: ∀ p, q, ∀ r : HO ( p, r + 1) ⊆ HO ( q, r ) P unif :: ∃ Π 0 , ∀ p, ∀ r : HO ( p, r ) = Π 0 P ♦ unif :: ∃ Π 0 , ∃ r 0 , ∀ p, ∀ r > r 0 : HO ( p, r ) = Π 0

  26. system type communication predicate P f Synchronous, reliable links K at most f faulty senders P f Synchronous, reliable links, K ∧ P reg at most f crash failures P f Asynchronous, reliable links, HO at most f crash failures P f Asynchronous, reliable links, HO ∧ P ♦ unif at most f initial crash failures P f Idem with n > 2 f K ∧ P unif P 1 Asynchronous, reliable links, K and failure detector S ♦ synchronous, reliable links, P f at most f crash failures HO ∧ P ♦ unif 0-25

  27. Our Results • Shorter and simpler proofs of important computability results • Communication predicates for which Consensus is solvable � What is necessary and sufficient to solve Consensus? • Interrelationships between communication predicates (or, how to be not lost in translation ...) • Agreement problems: new algorithms for new systems Realistic solutions to cope with transient and � dynamic failures

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend