MC714: Sistemas Distribu´ ıdos
- Prof. Lucas Wanner
MC714: Sistemas Distribu dos Prof. Lucas Wanner Instituto de - - PowerPoint PPT Presentation
MC714: Sistemas Distribu dos Prof. Lucas Wanner Instituto de Computac ao, Unicamp Aulas 1820: Tolerancia a falhas Introduction Basic concepts Process resilience Reliable client-server communication Reliable group
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 2 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 3 / 53
4 / 53
5 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 6 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 7 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 8 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 9 / 53
(a) (b) Flat group Hierarchical group Coordinator Worker
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 10 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 11 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 12 / 53
1 2 3 4 1 2 2 4 z 4 1 x 1 4 y 2 1 2 3 4 Got( Got( Got( Got( 1, 2, x, 4 1, 2, y, 4 1, 2, 3, 4 1, 2, z, 4 ) ) ) ) 1 Got 2 Got 4 Got ( ( ( ( ( ( ( ( ( 1, 1, 1, a, e, 1, 1, 1, i, 2, 2, 2, b, f, 2, 2, 2, j, y, x, x, c, g, y, z, z, k, 4 4 4 d h 4 4 4 l ) ) ) ) ) ) ) ) ) (a) (b) (c) Faulty process
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 13 / 53
1 2 3 1 2 1 x y 2 1 2 3 Got( Got( Got( 1, 2, x 1, 2, y 1, 2, 3 ) ) ) 1Got 2Got ( ( ( ( 1, 1, a, d, 2, 2, b, e, y x c f ) ) ) ) (a) (b) (c) Faultyprocess
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 14 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 15 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 16 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 17 / 53
Receive Receive Receive Execute Execute Crash Reply Crash REQ REQ REQ REP No REP No REP Server Server Server (a) (b) (c)
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 18 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 19 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 20 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 21 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 22 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 23 / 53
P1 joins the group P3 crashes P3 rejoins G = {P1,P2,P3,P4} G = {P1,P2,P4} G = {P1,P2,P3,P4} Partial multicast from P3 is discarded P1 P2 P3 P4 Time Reliable multicast by multiple point-to-point messages
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 24 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 25 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 26 / 53
COMMIT INIT WAIT ABORT Commit Vote-request Vote-abort Global-abort Vote-commit Global-commit (a) COMMIT INIT READY ABORT Vote-request Vote-commit Vote-request Vote-abort Global-abort ACK Global-commit ACK (b)
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 27 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 28 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 29 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 30 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 31 / 53
PRECOMMIT COMMIT INIT WAIT ABORT Commit Vote-request Vote-abort Global-abort Vote-commit Prepare-commit (a) Ready-commit Global-commit PRECOMMIT COMMIT INIT READY ABORT Vote-request Vote-commit Vote-request Vote-abort Global-abort ACK Prepare-commit Ready-commit (b) Global-commit ACK
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 32 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 33 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 34 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 35 / 53
P1 P2 Initial state Failure Checkpoint Time Recovery line Inconsistent collection
Message sent from P2 to P1
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 36 / 53
P1 P2 Initial state Failure Checkpoint Time Recovery line Inconsistent collection
Message sent from P2 to P1
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 37 / 53
P1 P2 Initial state Failure Checkpoint Time m m
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 38 / 53
a b c d e f g h a b c d e f g h a b c d e f g h a b c d e f g h Bad checksum (a) (b) (c) a b c d e f g h a b c d e f g h Sector has different value
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 39 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 40 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 41 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 42 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 43 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 44 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 45 / 53
P Q R Q crashes and recovers Unlogged message Logged message m1 m2 m2 m3 m3 m1 m2 is never replayed, so neither will m3 Time
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 46 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 47 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 48 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 49 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 50 / 53
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 51 / 53
1
2
1
2
3
3
4
5
52 / 53
6
7
8
9
10 Em um modelo de execuc
11 Fazer logging de mensagens na recepc
53 / 53