 
              Genuine atomic multicast in asynchronous distributed systems Rachid Guerraoui, Andre Schiper ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE Workshop on Distributed Algorithms '97 Introduction This%paper%addresses%the%problem%of% atomic'mul*cas*ng %messages%in% • asynchronous%distributed%systems.% • Be8er%understand%the%characteris9cs%of%the%atomic%mul9cast%problem,% and%in%par9cular%to%whether%the%possibility%and%impossibility%results%stated% for%atomic%broadcast,%also%apply%to%atomic%mul9cast.% Atomic%broadcast%=%simply%by%atomic%mul9cas9ng%every%message%to%all%the% • processes%in%the%system.%A%consequence%of%this%transforma9on,%together% with%the%FLP%result,%and%the%equivalence%of%atomic%broadcast%and% consensus%[Chandra],%is%that%atomic%mul9cast%is%impossible%to%solve%in% asynchronous%systems%if%one%process%can%crash.% Lower%bound%result%on%the%knowledge%about%failure%detec9on%needed%to% • solve%atomic%broadcast,%directly%applies%to%atomic%mul9cast%
Problem Could%we%solve%atomic%mul9cast%in%asynchronous%systems%augmented%with% • failure%detectors,%even%if%such%failure%detectors%are%unreliable%?%% Solution ? Could%we%solve%atomic%mul9cast%in%asynchronous%systems%augmented%with% • failure%detectors,%even%if%such%failure%detectors%are%unreliable%?% • Yes ,%as%a%simple%atomic%mul9cast%algorithm%can%be%obtained%from%any% atomic%broadcast%algorithm%
Solution ? Could%we%solve%atomic%mul9cast%in%asynchronous%systems%augmented%with% • failure%detectors,%even%if%such%failure%detectors%are%unreliable%?% • Yes ,%as%a%simple%atomic%mul9cast%algorithm%can%be%obtained%from%any% atomic%broadcast%algorithm% No ,%the%above%algorithm%is%a%“feigned”%mul9cast% • a%mul9cast%to%a%small%subset%turns%out%to%be%as%costly%as%a%broadcast%and% • the%benefit%of%a%mul9cast%is%in%this%case%lost.% Why not the previous solution ? “ Minimality ”%reflects%the%scalability%of%a%mul9cast,%and%we%require%from% • any%genuine%mul9cast%that%it%sa9sfies%this%property.% • The%“ Minimality ”%property%states%that%only%the%sender%and%the%addressees% of%a%message%should%be%involved%in%the%protocol%needed%to%deliver%the% message.% A%system%with%at%least%two%processes,%among%which%one%can%crash,%there% • exists%no%genuine%atomic%mul9cast%algorithm%using%a%failure%detector%that% can%be%wrong%about%at%least%two%processes.% A%corollary%of%this%result%is%that%genuine%atomic%mul9cast%is%strictly%harder% • than%atomic%broadcast.%
Key techniques and insights � • An%algorithm%A%is%an%atomic%mul9cast%algorithm,%if%in%every%run%R%of%A,%the% following%proper9es%are%sa9sfied:%% •% Agreement : %If%a%correct%process%TORdelivers%a%message%m,%then%every%correct% process%in%Dst(m)%eventually%TORdelivers%m.%% •% Validity : %If%a%correct%process%TORmul9casts%a%message%m,%then%every%correct% process%in%Dst(m)%eventually%TORdelivers%m.% %•% Integrity : %For%any%message%m,%every%correct%process%p%TORdelivers%m%at%most% once,%and%only%if%p% � %Dst(m)%and%m%was%TORmul9cast%by%some%process% Orig(m).% %•% Pairwise : %total%order:%If%two%correct%processes%p%and%q%TORdeliver%messages%m% and%m%,%then%p%TORdelivers%m%before%m%if%and%only%if%q%TORdelivers%m%before% m.% %•% Minimality : %If%a%correct%process%p%sends%or%receives%a%(non%null)%message%in%run% R,%then%some%message%m%is%TORmul9cast%in%R,%and%p% � %{Orig(m)}% � %Dst(m).% Intuition of Proof � • The%basic%idea%of%the%proof%is%by%contradic9on% Assume%that,%there%is%some%genuine%atomic%mul9cast%algorithm%A%using%a% • 2Runreliable%failure%detector.% A%message%m%R>%TORmul9cast%to%a%des9na9on%set% Dst(m) %% • • A%message%m'%R>%TORmul9cast%to%a%des9na9on%set% Dst(m')5 And% Dst(m)∩5Dst(m')5 =%{q1;%q2}.%% • A%par9al%run%R%of%A%in%which%no%process%crashes.% • • (1)%the%processes%of% Dst(m) %think%that%q2%has%crashed%and%then%TORdeliver% m,%whereas%(2)%the%processes%of% Dst(m')5 think%that%q1%has%crashed%and% then%TORdeliver%m'.% As%a%consequence,%process%q1%TORdelivers%m%but%not%m',%whereas%q2%TOR • delivers%m'%but%not%m,%viola9ng%the%proper9es%of%atomic%mul9cast.%
Failure scenario � Failure scenario �
Solution to the problem � %Restrict%this%model%by%considering%TORmul9cast%to%sets%of%nonRintersec9ng% • process%groups.% Solution to the problem � Solu*on,1,: % one5consensus5per5group5g5to5compute5the5group5<mestamp5tsg(m)5 • Consider%m%TORmul9cast%to% Dst(m) ,%where% Dst(m) %is%a%set%of%nonR • intersec9ng%groups:%% – Every%group%g%in% Dst(m) %;first%computes%a%group%9mestamp% tsg(m) ;%% – The%sequence%number% sn(m) %is%then%set%to%the%maximum%of%all%the%group% 9mestamps% tsg(m) . 5
Solution to the problem � Solu*on,2 :% one5consensus5in5Dst(m)5to5compute5the5sequence5number5sn(m)5 5 • Consider%m%TORmul9cast%to% Dst(m) ,%where% Dst(m) %is%a%set%of%nonR • intersec9ng%groups:%% – %Each%member%p%of%a%group%g%in% Dst(m) ,%when%receiving%m,%a8aches%a% 9mestamp% tsp(m) %to%m.% – Once%a%process%p%has%its%9mestamp% tsp(m) ,%p%then%sends% tsp(m) %to%all%the% processes%in% Dst(m) .%Process%p%then%waits%to%get%the%9mestamp% tsx(m) %from%a% majority%of%processes%of%every%group%in% Dst(m) .%These%9mestamps%are%used%by% p%to%define%its%ini9al%value% prop@snp(m) %for%a%consensus%protocol%to%decide%on% the%sequence%number% sn(m) :% prop@snp(m) %is%set%to%the%maximum%of%all% 9mestamps% tsx(m) %received%by%p.% – The%sequence%number% sn(m) %is%the%decision%of%the%consensus%protocol%among% the%processes%in% Dst(m) .% 5 What should everyone remember about this paper? • In%contrast%to%atomic%broadcast,%genuine%atomic%mul9cast%is% impossible'to' solve'with'failure'detectors'that'are'unreliable .% Discuss%a%way%to%circumvent%the%impossibility%result,%by%restric9ng%the% • des9na9ons%of%mul9casts%to%sets%of%disjoint%process%groups,%each%group% behaving%like%a%logically%correct%en9ty.%
Appendix (Dr. Skeen) • When%a%process%p%TORmul9casts%a%message% m5 to% Dst(m) ,% p %sends%the% message%to%every%member%of% Dst(m) .%Every%process% q5 � 5Dst(m) %that% receives% m ,%stores% m %in%a% pending5buffer,5 and%sends%back%to% p %a%9mestamp% tsq(receive(m)) %corresponding%to% q ’s%current%logical%clock.% Process% p %then%collects%the%9mestamps%from%all%the%processes%in% Dst(m) ,% • defines%a%sequence%number% sn(m) %as%the%maximum%of%the%9mestamps,% and%sends% sn(m) %to%every%member%of% Dst(m) .% Every%process% q5 � 5Dst(m) %that%receives% sn(m) ,%removes% m %from%its% • pending%buffer%and%stores%it%in%a%delivery%buffer.% Process% q %TORdelivers% m %when%(1)%there%is%no%message% m5=5m %in%its%pending% • buffer%for%which% tsq(receive(m)) < sn(m) %and%(2)%there%is%no%message% m” % =5m % in%its%delivery%buffer%for%which% sn(m”)5 <% sn(m) .% 5
Recommend
More recommend