RAFT Consensus
Slide content borrowed from Diego Ongaro, John Ousterhout, and Alberto Montresor
RAFT Consensus Slide content borrowed from Diego Ongaro, John - - PowerPoint PPT Presentation
RAFT Consensus Slide content borrowed from Diego Ongaro, John Ousterhout, and Alberto Montresor Log Consensus Bit consensus: agree on a single bit, based on inputs (0,1,0,0,1,0,0) -> 1 Log consensus: agree on contents and order of
Slide content borrowed from Diego Ongaro, John Ousterhout, and Alberto Montresor
midterm-solutions.tex
program
determinist
state
commands in the same order
in the log order
deterministic results
process the command and sends it reply to the client
Recent archaeological discoveries on the island of Paxos reveal that the parliament functioned despite the peripatetic propensity of its part- time legislators. The legislators maintained consistent copies of the parliamentary record, despite their frequent forays from the chamber and the forgetfulness of their messengers. The Paxon parliament’s protocol provides a new way of implementing the state-machine approach to the design of distributed systems — an approach that has received limited attention because it leads to designs of insufficient complexity.
Computer Systems
engineering perspective. PODC 2007, Portland, Oregon.
Google Analytics and other products
its platform
Paxos algorithm for replication between nodes in a cluster
ZooKeeper used in previous versions.
“The dirty little secret of the NSDI* community is that at most five people really, truly understand every part of Paxos ;-).” – Anonymous NSDI reviewer *The USENIX Symposium on Networked Systems Design and Implementation
“There are significant gaps between the description of the Paxos algorithm and the needs of a real-world system…the final system will be based on an unproven protocol.” – Chubby authors
How servers will pick a—single—leader
How the leader will accept log entries from clients, propagate them to the
leader before the others
Follower A Follower B Leader Last heartbeat
X
Timeouts Follower with the shortest timeout becomes the new leader
followers
State machine
Log Client
State machine
Log
State machine
Log
acknowledge its receipt
State machine
Log Client
State machine
Log
State machine
Log
a majority of the servers, it updates its state machine
State machine
Log Client
State machine
Log
State machine
Log
they update their state machines
State machine
Log Client
State machine
Log
State machine
Log
Colors identify terms
recently committed entry
sequential order
Greatly simplifies the protocol
fully replicated a previous entry
have
(new term)
former followers
State machine
Log
State machine
Log
credentials of candidate
State machine
Log
State machine
Log
duplicate its own
duplicate in their logs the contents of its own log
State machine
Log
State machine
Log
further AppendEntry calls
previous AppendEntry call
follower match
the first log entry that follower (b) will accept is log entry 5
implicitly acknowledges it has processed all previous AcceptEntry RPCs Followers' logs cannot skip entries
entries?
entries
their log
least as up to date as their own log.
Servers holding the last committed log entry Servers having elected the new leader Two majorities of the same cluster must intersect
term is committed even if it is stored on a majority of servers.
counting replicas
committed indirectly
S4, and itself, and accepts a different entry at log index 2.
replication.
is not committed.
S3, and S4) and overwrite the entry with its own entry from term 3.
(S5 cannot win an election).
configuration
from both old and new configurations
blank lines.
various stages of development