Computing in a Distributed System in the Presence of Benign Failures - PowerPoint PPT Presentation

Computing in a Distributed System in the Presence of Benign Failures Bernadette CHARRON-BOST, CNRS (joint work with Andr´ e SCHIPER , EPFL)

Distributed System medium of communication computational unit No universal computational model for distributed systems

Two Basic Principles • The model must specify why faults occur Causes of two different natures: • Degree of synchronism • Failure model

Two Basic Principles • The model must specify why faults occur • The model must specify by whom (culprit) faults occur

Two Basic Principles • The model must specify why faults occur • The model must specify by whom faults occur The notion of faulty component is necessary and useful for the analysis of distributed computations

First Principle bounded delays ( synchronous ) finite delays ( asynchronous ) arbitrary delays ( failure ) . . . breaks the natural continuum from bounded to infinite delays !

A classical type of systems Synchronous system + crash failures

A classical type of systems Synchronous system + crash failures • transmission delays bounded • process speeds bounded or infinite

First Principle • breaks the natural continuum from bounded to infinite delays • synchronism degree and failure model are not independant

Second Principle • may lead to undesirable conclusions Only one transmission fault from each node each process is considered faulty Send omission model � (no algorithm when the entire system is faulty)

Second Principle • may lead to undesirable conclusions • faulty processes are allowed to have deviant behaviors “Every correct process eventually decides” One transmission failure for a message sent by p to q Send omission model: p is allowed to make no decision � Link failure model: p and q must make a decision � Receive omission model: q is allowed to make no decision �

Second Principle • may lead to undesirable conclusions • faulty processes are allowed to have deviant behaviors • real causes of transmission failures are often unknown

Second Principle • may lead to undesirable conclusions • faulty processes are allowed to have deviant behaviors • real causes of transmission failures are often unknown • no evidence that the notion of faulty component is helpful

The Heard-Of Model We just specify transmission faults: we don’t consider anymore by whom nor why faults occur

HO: a Round-Based Model p local sending phase receive phase computation (to all) round r At each round, every process sends messages to all allows us to distinguish semantic and operational � features of computations

HO: a Round-Based Model p local sending phase receive phase computation (to all) round r If m is received at round r then m has been sent at round r � Rounds are communication-closed layers

First Principle bounded delays ( synchronous ) arbitrary delays ( failure ) � late messages are discarded [Dwork, Lynch & Stockmeyer, 1988] and [Gafni, 1998]

HO Process  Init p ⊆ States p States p ,  S p : ( s, q ) → m q µ ) → s ′ T p : ( s, �  s s ′ p round r At round r , process p receives messages from HO ( p, r ) supp( � µ ) = HO ( p, r )

Second Principle Faults are specified but not the culprits � [Santoro & Widmayer 1989]

HO Algorithm • Distributed algorithm on Π A = ( States p , Init p , S p , T p ) p ∈ Π • Run of algorithm A  ( s 0 with s 0 p ∈ Init p p ) p ∈ Π  ( HO ( p, r )) p ∈ Π ,r> 0 

• Kernel of round r : � K ( r ) = HO ( p, r ) p ∈ Π • coKernel of round r : coK ( r ) = Π \ K ( r ) • Global kernel (of a run): � � K = HO ( p, r ) = K ( r ) r> 0 p ∈ Π ,r> 0 • Global coKernel (of a run): coK = Π \ K

Communication Predicate Predicate over collections of heard-of sets P nosplit :: ∀ p, q, ∀ r : HO ( p, r ) ∩ HO ( q, r ) � = ∅ P sp unif :: ∀ p, q, ∀ r : HO ( p, r ) = HO ( q, r )

Communication Predicate Predicate over collections of heard-of sets endogenous definition of the system properties � ( � = Failure Detector model )

P f | K | ≥ n − f K :: P f ∀ p, ∀ r : | HO ( p, r ) | ≥ n − f HO :: P reg :: ∀ p, q, ∀ r : HO ( p, r + 1) ⊆ HO ( q, r ) P unif :: ∃ Π 0 , ∀ p, ∀ r : HO ( p, r ) = Π 0 P ♦ unif :: ∃ Π 0 , ∃ r 0 , ∀ p, ∀ r > r 0 : HO ( p, r ) = Π 0

system type communication predicate P f Synchronous, reliable links K at most f faulty senders P f Synchronous, reliable links, K ∧ P reg at most f crash failures P f Asynchronous, reliable links, HO at most f crash failures P f Asynchronous, reliable links, HO ∧ P ♦ unif at most f initial crash failures P f Idem with n > 2 f K ∧ P unif P 1 Asynchronous, reliable links, K and failure detector S ♦ synchronous, reliable links, P f at most f crash failures HO ∧ P ♦ unif 0-25

Our Results • Shorter and simpler proofs of important computability results • Communication predicates for which Consensus is solvable � What is necessary and sufficient to solve Consensus? • Interrelationships between communication predicates (or, how to be not lost in translation ...) • Agreement problems: new algorithms for new systems Realistic solutions to cope with transient and � dynamic failures

Computing in a Distributed System in the Presence of Benign Failures - PowerPoint PPT Presentation

Computing in a Distributed System in the Presence of Benign Failures Bernadette CHARRON-BOST, CNRS (joint work with Andr e SCHIPER , EPFL) Distributed System medium of communication computational unit No universal computational model for

Presence Presence Presence When we wake up in the morning we may automatically leave our

Distributed File Systems Distributed File Systems A distributed file system (DFS) is a

Distributed Databases Distributed database management system A distributed database (DDB) is

Distributed Coordination What makes a system distributed? Time in a distributed system

On safety in distributed computing Srivatsan Ravi On safety in distributed computing Safety in

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

Distributed File Systems Issues in Distributed File Service Case Studies: Sun

DISTRIBUTED SYSTEMS Department of Computing Science Umea University Distributed Systems - D N

Collaborative Human Computing Zack Zhu March 31, 2010 Seminar for Distributed Computing 1

Distributed Databases 1 19.1 Distributed Database System A distributed database system

Distributed Systems Just what is a Distributed System? Definitions "A Distributed System

LHCb Computing Computing LHCb Nick Brook Organisation LHCb software Distributed

The Arvy Distributed Directory Protocol Pankaj Khanchandani, Roger Wattenhofer ETH Zurich -

Distributed Objects Message Passing vs. Distributed Objects Message Passing versus Distributed

Object-Oriented Distributed Technology Objects Objects in Distributed Systems

Protection and Security - II Tevfik Ko ar Louisiana State University April 17 th , 2007 1

Feature constraints to modelise Unix filesystems Nicolas Jeannerod IRIF February 7, 2018 1/27

Precise comonads for dataflow computation and tree transformations Not obsessed with monads so

t t tt r rsrs

Circumventing Impossibility Partial Synchrony Circumventing Impossibility Consensus is an

A design pattern for component oriented development of agent-based multithreaded applications A

Business to IT Transformations Revisited Sebastian Stein 1 Stefan Khne 2 Konstantin Ivanov 1 1

The White Rabbit project an Ethernet-based solution for sub-ns synchronization and deterministic

Computing in a Distributed System in the Presence of Benign Failures - PowerPoint PPT Presentation

Computing in a Distributed System in the Presence of Benign Failures Bernadette CHARRON-BOST, CNRS (joint work with Andr e SCHIPER , EPFL) Distributed System medium of communication computational unit No universal computational model for

Presence Presence Presence When we wake up in the morning we may automatically leave our

Distributed File Systems Distributed File Systems A distributed file system (DFS) is a

Distributed Databases Distributed database management system A distributed database (DDB) is

Distributed Coordination What makes a system distributed? Time in a distributed system

On safety in distributed computing Srivatsan Ravi On safety in distributed computing Safety in

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

Distributed File Systems Issues in Distributed File Service Case Studies: Sun

DISTRIBUTED SYSTEMS Department of Computing Science Umea University Distributed Systems - D N

Collaborative Human Computing Zack Zhu March 31, 2010 Seminar for Distributed Computing 1

Distributed Databases 1 19.1 Distributed Database System A distributed database system

Distributed Systems Just what is a Distributed System? Definitions &quot;A Distributed System

LHCb Computing Computing LHCb Nick Brook Organisation LHCb software Distributed

The Arvy Distributed Directory Protocol Pankaj Khanchandani, Roger Wattenhofer ETH Zurich -

Distributed Objects Message Passing vs. Distributed Objects Message Passing versus Distributed

Object-Oriented Distributed Technology Objects Objects in Distributed Systems

Protection and Security - II Tevfik Ko ar Louisiana State University April 17 th , 2007 1

Feature constraints to modelise Unix filesystems Nicolas Jeannerod IRIF February 7, 2018 1/27

Precise comonads for dataflow computation and tree transformations Not obsessed with monads so

t t tt r rsrs

Circumventing Impossibility Partial Synchrony Circumventing Impossibility Consensus is an

A design pattern for component oriented development of agent-based multithreaded applications A

Business to IT Transformations Revisited Sebastian Stein 1 Stefan Khne 2 Konstantin Ivanov 1 1

The White Rabbit project an Ethernet-based solution for sub-ns synchronization and deterministic

Distributed Systems Just what is a Distributed System? Definitions "A Distributed System