Rediscovering Distributed Systems
Steve Vinoski Basho Technologies Cambridge, MA USA http://basho.com @stevevinoski
vinoski@ieee.org http://steve.vinoski.net/
1 Thursday, October 17, 13
Rediscovering Distributed Systems Steve Vinoski Basho - - PowerPoint PPT Presentation
Rediscovering Distributed Systems Steve Vinoski Basho Technologies Cambridge, MA USA http://basho.com @stevevinoski vinoski@ieee.org http://steve.vinoski.net/ Thursday, October 17, 13 1 Distributed Systems are Everywhere Thursday,
Steve Vinoski Basho Technologies Cambridge, MA USA http://basho.com @stevevinoski
vinoski@ieee.org http://steve.vinoski.net/
1 Thursday, October 17, 13
2 Thursday, October 17, 13
3 Thursday, October 17, 13
enormous history of research and practice
many issues from many angles
right trade-offs
4 Thursday, October 17, 13
personal history and experiences
different talk
5 Thursday, October 17, 13
6 Thursday, October 17, 13
"Intergalactic Computer Network" would eventually lead to the Internet
7 Thursday, October 17, 13
states and Ontario
massive cascading failures
failure is not uncommon
8 Thursday, October 17, 13
Multiprogramming System"
set of hierarchical cooperating sequential processes
synchronization via semaphores
9 Thursday, October 17, 13
"At the time this was written the testing had not yet been completed, but the resulting system is guaranteed to be flawless."
—E.W. Dijkstra
"The Structure of the 'THE' Multiprogramming System"
10 Thursday, October 17, 13
11 Thursday, October 17, 13
distributed computing
12 Thursday, October 17, 13
Network Operating Systems", Akkoyunlu, Bernstein, Schantz, 1974
topologies
receiver might find and identify each other
13 Thursday, October 17, 13
14 Thursday, October 17, 13
15 Thursday, October 17, 13
16 Thursday, October 17, 13
"Users and administrators of a small computer often desire more service than it can provide. In a network environment additional services can be provided to the small computer, and in turn to the users of the small computer, by one or more other computers."
—Akkoyunlu, Bernstein, Schantz
"Interprocess Communication Facilities for Network Operating Systems"
17 Thursday, October 17, 13
Resource Sharing", Cosell et al., 1975
18 Thursday, October 17, 13
"Further , it was becoming clear that for many users, in particular those whose access to the network was via TIPs or other non- TENEX hosts, it should not actually matter which host provides the TENEX service so long as the users could do their computing in the manner to which they had become accustomed."
—Cosell et al.
"An Operational System for Computer Resource Sharing"
19 Thursday, October 17, 13
"A number of advantages would result from such resource sharing. The user would see TENEX as a much more accessible and reliable resource. Because he would no longer be dependent upon a single host for his computing, he would be able to access the TENEX virtual machine even when one or more of the TENEX hosts were unavailable."
—Cosell et al.
"An Operational System for Computer Resource Sharing"
20 Thursday, October 17, 13
21 Thursday, October 17, 13
"A terminal with a resident text editor, whether it is provided by hardware or software, is not an example of a distributed data processing system." "If the terminal coordinates several concurrent and simultaneous remote jobs, giving each a different type of service at a different location, without human intervention, then it more closely resembles a distributed system."
—P.H. Enslow, Jr.
"What is a 'Distributed' Data Processing System?"
22 Thursday, October 17, 13
"Participants generally agreed that distributed processing is made possible by the price-performance revolution in microelectronics."
—Eckhouse and Stankovic
"Issues in Distributed Processing - An Overview of Two Workshops"
23 Thursday, October 17, 13
24 Thursday, October 17, 13
"A distributed system can be described as a particular sequential state machine that is implemented with a network of processors."
Leslie Lamport
http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#time-clocks 25 Thursday, October 17, 13
Events in a Distributed System", L. Lamport, 1978
ever
26 Thursday, October 17, 13
and B is the receive of M in process P2
27 Thursday, October 17, 13
LogicalClock(A) < LogicalClock(B)
28 Thursday, October 17, 13
P1 P2 P3 e11 e21 e31
t t t
1 1 1
29 Thursday, October 17, 13
P1 P2 P3 e11 e12 e21 e22 e31 e32
1 2 1 3 1 2
t t t
30 Thursday, October 17, 13
P1 P2 P3 e11 e12 e13 e14 e21 e22 e23 e31 e32 e33
1 2 3 1 3 4 5 1 2 3
e34 e24
5 4
t t t
Only partial ordering, since e.g. e32 ↛ e14
31 Thursday, October 17, 13
arbitrary ordering for processes
works with physical clocks
32 Thursday, October 17, 13
languages
languages
languages?
33 Thursday, October 17, 13
Tony Hoare, 1978
primitives for structuring concurrent programs
34 Thursday, October 17, 13
control structures
execution
processes to communicate
35 Thursday, October 17, 13
show solutions to a variety of programming problems
36 Thursday, October 17, 13
Barbara Liskov, 1979
primitives, focusing on modularity and communication
37 Thursday, October 17, 13
divisions
building distributed programs
38 Thursday, October 17, 13
with distributed computing primitives
Structured programming + Modularity Data abstraction
39 Thursday, October 17, 13
communicate only via messages
40 Thursday, October 17, 13
arguments
messages they accept
41 Thursday, October 17, 13
system
distributed programming needed
42 Thursday, October 17, 13
started operating in late 1969
human-to-computer communications
application protocols
43 Thursday, October 17, 13
Framework for Network-Based Resource Sharing, 1975
applications
look just like library programming?
44 Thursday, October 17, 13
would become the remote procedure call (RPC)
"While the procedure call may be an appropriate basis for certain applications, we believe that it can neither directly nor accurately model the interactions and control structures that occur in many distributed multi-computer systems."
—R. Schantz, RFC 684
45 Thursday, October 17, 13
Communicating Systems
Databases"
46 Thursday, October 17, 13
47 Thursday, October 17, 13
programming
48 Thursday, October 17, 13
whole programming languages and runtimes
unified programming language, compiler, and operating system
49 Thursday, October 17, 13
50 Thursday, October 17, 13
research efforts, publications almost never mentioned them
RPC “black box,” hidden between client and server RPC stubs
51 Thursday, October 17, 13
system
(partitions, downed nodes)
consistency
52 Thursday, October 17, 13
"Implementing Remote Procedure Calls"
53 Thursday, October 17, 13
remote interface functions and types
programming language stubs and type definitions
54 Thursday, October 17, 13
define system interfaces, then translated into C and Domain Pascal
also used the IDL to define remote interfaces
Computing Environment (DCE)
55 Thursday, October 17, 13
networked file system (not bolted on later)
//foo/path/to/file
the Apollo "//" to use in URLs
path uses "\\", likely due to Paul Leach who left HP/Apollo for Microsoft in 1991
56 Thursday, October 17, 13
57 Thursday, October 17, 13
the mid-80s at Ericsson by Joe Armstrong
distribution
influence from work preceding it
58 Thursday, October 17, 13
and Fidge
timestamps, keep a vector of clocks, one for each process
causality, vector clocks can
59 Thursday, October 17, 13
from Fidge "Timestamps in Message Passing Systems That Preserve the Partial Ordering"
60 Thursday, October 17, 13
61 Thursday, October 17, 13
presence of malicious processes
62 Thursday, October 17, 13
reaching consensus in bounded time can be impossible with just one fault
Impossibility Proofs for Distributed Computing"
63 Thursday, October 17, 13
"What good are impossibility results, anyway? They don't seem very useful at first... Most obviously, impossibility results tell you when you should stop trying to devise or improve an algorithm."
—Nancy Lynch
http://groups.csail.mit.edu/tds/papers/Lynch/podc89.pdf
64 Thursday, October 17, 13
Multiprocess Programs", 1977
"Recognizing Safety and Liveness", 1987 and their prior related work
distributed systems designs, approaches, trade-offs
65 Thursday, October 17, 13
consistency across a system
proposed value is chosen
66 Thursday, October 17, 13
happens
every request
eventually chosen
67 Thursday, October 17, 13
in the Presence of Partial Synchrony" 1988
Replication work for high availability
68 Thursday, October 17, 13
computing (Isis, Horus)
distributed OO operating system, RPC-based
Systems Architecture" (ANSA), models and rules for distributed systems designs. Objects, transactions, interfaces. Influenced the Object Management Group (OMG)
69 Thursday, October 17, 13
70 Thursday, October 17, 13
71 Thursday, October 17, 13
80s
systems were based on objects
stacks, including OS, language, and compiler
72 Thursday, October 17, 13
choice but to
into their own stacks
features available for “normal” programming languages, without changing those languages
73 Thursday, October 17, 13
published 1991
1999 book
today in 2013, and this book still sells
74 Thursday, October 17, 13
AI = Application Interfaces CF = Common Facilities DI = Domain Interfaces OS = Object Services
AI DI OS DI CF CF OS OS CF OS OS CORBA ORB
Example: Object Management Architecture (OMA)
from the Object Management Group (OMG)
75 Thursday, October 17, 13
76 Thursday, October 17, 13
but impractical goal
integration still involves numerous approaches
77 Thursday, October 17, 13
reliable.
infinite.
secure.
change.
administrator.
zero.
homogeneous.
78 Thursday, October 17, 13
1. Partitions do not
2. Latency is zero. 3. Bandwidth is infinite. 4. The network is secure. 5. Topology doesn't change. 6. There is one administrator. 7. Transport cost is zero. 8. The network is homogeneous. 9. Clocks are synchronized.
ignored.
79 Thursday, October 17, 13
largely ignored concurrent object access
control service, but used only for the distributed transaction service
for the transaction service
80 Thursday, October 17, 13
Ann Wollrath, Sam Kendall
that local/remote transparency was a desirable goal
81 Thursday, October 17, 13
defines the Paxos algorithm, still widely used today
panned by reviewers
so Lamport finally resubmitted for publication in 1998
82 Thursday, October 17, 13
importance of Paxos
Available System Using Consensus" in 1996
systems
83 Thursday, October 17, 13
Tolerant Services Using the State Machine Approach: A Tutorial", 1990
Hashing, 1997
and Scalable Tolerant Systems", 1999
84 Thursday, October 17, 13
85 Thursday, October 17, 13
86 Thursday, October 17, 13
defined by Roy Fielding in his doctoral thesis, 2000 based on his work on the web and HTTP
applications
87 Thursday, October 17, 13
define an architecture
88 Thursday, October 17, 13
desired properties such as
simplicity
balancing, redundancy)
89 Thursday, October 17, 13
constraints
need
trade-offs to get them
90 Thursday, October 17, 13
complexity of Paxos, so he wrote this in 2001
91 Thursday, October 17, 13
Partition Tolerance conjecture, 2000
Lynch
quite right
92 Thursday, October 17, 13
93 Thursday, October 17, 13
large-scale highly-available eventually consistent key-value datastore
and Voldemort databases
implementing Dynamo-like systems https://github.com/basho/riak_core
94 Thursday, October 17, 13
reliable distributed systems in the presence of software errors"
2001
95 Thursday, October 17, 13
96 Thursday, October 17, 13
correctly in eventually consistent, highly available systems
concurrently or under partition
Convergent Commutative Conflict-free Replicated Data Types
97 Thursday, October 17, 13
Algorithm", 2013
(ZAB)
98 Thursday, October 17, 13
Monotonicity
language that helps deal with distributed consistency
99 Thursday, October 17, 13
distributed systems work coming from:
Alvaro, William Marczak from the UC Berkeley Database Group
AMPLab at UC Berkeley
100 Thursday, October 17, 13
systems using Jepsen https:// github.com/aphyr/jepsen
posts on aphyr.com describing his experiments, lots of detailed distributed systems knowledge and insights
101 Thursday, October 17, 13
102 Thursday, October 17, 13
and many developers work on them
103 Thursday, October 17, 13
about, due to many subtle details
104 Thursday, October 17, 13
extremely rich
105 Thursday, October 17, 13
but are worth it
106 Thursday, October 17, 13
vocabulary of theory and techniques for tackling the problems you work on
107 Thursday, October 17, 13
your distributed system
prior work
108 Thursday, October 17, 13
@stevevinoski
109 Thursday, October 17, 13
transcriptions/EWD01xx/EWD123.html
http://dl.acm.org/citation.cfm?id=363143
Operating Systems", 1974 http://ieeexplore.ieee.org/xpl/articleDetails.jsp? arnumber=6323582
Operational System for Computer Resource Sharing", 1975 http://www.webstart.com/ papers/tenex-rsexec.pdf
research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf
id=359585 see also http://www.usingcsp.com
doid=800215.806567
110 Thursday, October 17, 13
http://tools.ietf.org/rfc/rfc707
http://tools.ietf.org/html/rfc684
System", 1979 http://research.microsoft.com/en-us/um/people/blampson/21- crashrecovery/Abstract.html (originally part of the Distributed Processing Workshop, Brown University, August 1976)
research.microsoft.com/en-US/um/people/Lamport/pubs/proving.pdf
research.microsoft.com/en-us/um/people/lamport/pubs/implementation.pdf
library/cyberdig.nsf/papers/A776EC17FC2FCE73852579F100578964/$File/RJ2571.pdf
111 Thursday, October 17, 13
research.microsoft.com/en-us/um/people/lamport/pubs/reaching.pdf
www.cs.cornell.edu/courses/cs614/2004sp/papers/lsp82.pdf
Programs", 1983 (available at) http://www.cs.brandeis.edu/~cs147a/papers/liskov-argus.pdf
Process", 1983 http://groups.csail.mit.edu/tds/papers/Lynch/pods83-flp.pdf
000/521/744/object_structure_in_the_emerald_system.pdf
fbs/publications/RecSafeLive.pdf
(available at) http://zoo.cs.yale.edu/classes/cs426/2012/lab/bib/fidge88timestamps.pdf
homes.cs.washington.edu/~arvind/cs425/doc/mattern89virtual.pdf
112 Thursday, October 17, 13
http://groups.csail.mit.edu/tds/papers/Lynch/jacm88.pdf
Highly-Available Distributed Systems", 1988 (available at) http://www.cs.princeton.edu/ courses/archive/fall09/cos518/papers/viewstamped.pdf
groups.csail.mit.edu/tds/papers/Lynch/podc89.pdf
Tutorial", 1990 http://www.cs.cornell.edu/fbs/publications/smsurvey.pdf
www.cc.gatech.edu/classes/AY2010/cs4210_fall/papers/smli_tr-94-29.pdf
research.microsoft.com/en-us/um/people/blampson/58-Consensus/Acrobat.pdf
113 Thursday, October 17, 13
Relieving Hot Spots on the World Wide Web", 1997 http://dl.acm.org/citation.cfm? id=258660
lab.mscs.mu.edu/Dist2012/lectures/HarvestYield.pdf
2000 http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm
~brewer/cs262b-2004/PODC-keynote.pdf
lamport/pubs/paxos-simple.pdf
for Large-Scale Peer-to-Peer Systems", 2001 http://research.microsoft.com/en-us/um/ people/antr/PAST/pastry.pdf
114 Thursday, October 17, 13
http://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf
Partition-Tolerant Web Services", 2002 http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture- SigAct.pdf
http://www.erlang.org/download/armstrong_thesis_2003.pdf
http://research.google.com/archive/chubby.html
www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
Types", 2011 http://hal.inria.fr/docs/00/55/55/88/PDF/techreport.pdf
115 Thursday, October 17, 13
00/60/93/99/PDF/RR-7687.pdf
www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed
http://queue.acm.org/detail.cfm?id=2462076
http://db.cs.berkeley.edu/papers/cidr11-bloom.pdf
db.cs.berkeley.edu/papers/socc12-blooml.pdf
https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf
116 Thursday, October 17, 13