Rediscovering Distributed Systems Steve Vinoski Basho - - PowerPoint PPT Presentation

rediscovering distributed systems
SMART_READER_LITE
LIVE PREVIEW

Rediscovering Distributed Systems Steve Vinoski Basho - - PowerPoint PPT Presentation

Rediscovering Distributed Systems Steve Vinoski Basho Technologies Cambridge, MA USA http://basho.com @stevevinoski vinoski@ieee.org http://steve.vinoski.net/ Thursday, October 17, 13 1 Distributed Systems are Everywhere Thursday,


slide-1
SLIDE 1

Rediscovering Distributed Systems

Steve Vinoski Basho Technologies Cambridge, MA USA http://basho.com @stevevinoski

vinoski@ieee.org http://steve.vinoski.net/

1 Thursday, October 17, 13

slide-2
SLIDE 2

Distributed Systems are Everywhere

2 Thursday, October 17, 13

slide-3
SLIDE 3

Distributed Systems are Difficult

3 Thursday, October 17, 13

slide-4
SLIDE 4

On The Shoulders of Giants

  • "Distributed systems" describes an

enormous history of research and practice

  • Dist Sys research/practice addresses

many issues from many angles

  • Know the issues so you can choose the

right trade-offs

4 Thursday, October 17, 13

slide-5
SLIDE 5

Scope

  • Way way too much to cover
  • This talk is based in part on my

personal history and experiences

  • Others would give a completely

different talk

5 Thursday, October 17, 13

slide-6
SLIDE 6

1960s

6 Thursday, October 17, 13

slide-7
SLIDE 7

1960s

  • Beginnings of concurrent systems
  • 1965: Dijkstra's semaphores
  • Beginnings of computer networking
  • J.C.R. Licklider's 1962 dream of an

"Intergalactic Computer Network" would eventually lead to the Internet

  • Beginning of OO: Simula 67

7 Thursday, October 17, 13

slide-8
SLIDE 8

Distributed Systems Failure

  • 1965 Northeast blackout affected 7 US

states and Ontario

  • A single misconfigured relay caused

massive cascading failures

  • Then and now: distributed systems

failure is not uncommon

8 Thursday, October 17, 13

slide-9
SLIDE 9

Dijkstra and Multiprogramming

  • 1968: "The Structure of the 'THE'

Multiprogramming System"

  • Describes a whole system designed as a

set of hierarchical cooperating sequential processes

  • System resources shared via mutual

synchronization via semaphores

9 Thursday, October 17, 13

slide-10
SLIDE 10

Dijkstra and Multiprogramming

"At the time this was written the testing had not yet been completed, but the resulting system is guaranteed to be flawless."

—E.W. Dijkstra

"The Structure of the 'THE' Multiprogramming System"

10 Thursday, October 17, 13

slide-11
SLIDE 11

1970s

11 Thursday, October 17, 13

slide-12
SLIDE 12

1970s Issues

  • Interprocess Communication
  • Resource sharing
  • Programming languages and

distributed computing

  • Application-to-application protocols

12 Thursday, October 17, 13

slide-13
SLIDE 13
  • "Interprocess Communication Facilities for

Network Operating Systems", Akkoyunlu, Bernstein, Schantz, 1974

  • Discusses IPC over different network

topologies

  • Compares connection-oriented and message-
  • riented IPC facilities
  • Discusses concerns of how sender and

receiver might find and identify each other

Interprocess Communication (IPC)

13 Thursday, October 17, 13

slide-14
SLIDE 14

IPC

14 Thursday, October 17, 13

slide-15
SLIDE 15

IPC

15 Thursday, October 17, 13

slide-16
SLIDE 16

16 Thursday, October 17, 13

slide-17
SLIDE 17

Resource Sharing

"Users and administrators of a small computer often desire more service than it can provide. In a network environment additional services can be provided to the small computer, and in turn to the users of the small computer, by one or more other computers."

—Akkoyunlu, Bernstein, Schantz

"Interprocess Communication Facilities for Network Operating Systems"

17 Thursday, October 17, 13

slide-18
SLIDE 18

Resource Sharing

  • "An Operational System for Computer

Resource Sharing", Cosell et al., 1975

  • Ideas like today's cloud computing

18 Thursday, October 17, 13

slide-19
SLIDE 19

Resource Sharing

"Further , it was becoming clear that for many users, in particular those whose access to the network was via TIPs or other non- TENEX hosts, it should not actually matter which host provides the TENEX service so long as the users could do their computing in the manner to which they had become accustomed."

—Cosell et al.

"An Operational System for Computer Resource Sharing"

19 Thursday, October 17, 13

slide-20
SLIDE 20

Resource Sharing

"A number of advantages would result from such resource sharing. The user would see TENEX as a much more accessible and reliable resource. Because he would no longer be dependent upon a single host for his computing, he would be able to access the TENEX virtual machine even when one or more of the TENEX hosts were unavailable."

—Cosell et al.

"An Operational System for Computer Resource Sharing"

20 Thursday, October 17, 13

slide-21
SLIDE 21

Distributed Processing, January 1978

21 Thursday, October 17, 13

slide-22
SLIDE 22

"A terminal with a resident text editor, whether it is provided by hardware or software, is not an example of a distributed data processing system." "If the terminal coordinates several concurrent and simultaneous remote jobs, giving each a different type of service at a different location, without human intervention, then it more closely resembles a distributed system."

Distributed Processing, January 1978

—P.H. Enslow, Jr.

"What is a 'Distributed' Data Processing System?"

22 Thursday, October 17, 13

slide-23
SLIDE 23

1978: Issues in Distributed Processing

"Participants generally agreed that distributed processing is made possible by the price-performance revolution in microelectronics."

—Eckhouse and Stankovic

"Issues in Distributed Processing - An Overview of Two Workshops"

23 Thursday, October 17, 13

slide-24
SLIDE 24

1978: Issues in Distributed Processing

  • IPC
  • Distributed operating systems
  • Distributed databases
  • Load balancing

24 Thursday, October 17, 13

slide-25
SLIDE 25

State Machines

"A distributed system can be described as a particular sequential state machine that is implemented with a network of processors."

Leslie Lamport

http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#time-clocks 25 Thursday, October 17, 13

slide-26
SLIDE 26

Ordering of Events

  • "Time, Clocks, and the Ordering of

Events in a Distributed System", L. Lamport, 1978

  • Probably the most cited Dist Sys paper

ever

26 Thursday, October 17, 13

slide-27
SLIDE 27

Ordering of Events

  • event A happens before event B if
  • A occurs before B in the same process
  • or A is a send of M from process P1,

and B is the receive of M in process P2

  • happens-before is written as A → B
  • if A → B and B → C, then A → C

27 Thursday, October 17, 13

slide-28
SLIDE 28

Ordering of Events

  • Clock Condition:
  • if A → B then

LogicalClock(A) < LogicalClock(B)

28 Thursday, October 17, 13

slide-29
SLIDE 29

Ordering of Events

P1 P2 P3 e11 e21 e31

t t t

1 1 1

29 Thursday, October 17, 13

slide-30
SLIDE 30

Ordering of Events

P1 P2 P3 e11 e12 e21 e22 e31 e32

1 2 1 3 1 2

t t t

30 Thursday, October 17, 13

slide-31
SLIDE 31

Ordering of Events

P1 P2 P3 e11 e12 e13 e14 e21 e22 e23 e31 e32 e33

1 2 3 1 3 4 5 1 2 3

e34 e24

5 4

t t t

Only partial ordering, since e.g. e32 ↛ e14

31 Thursday, October 17, 13

slide-32
SLIDE 32

Ordering of Events

  • Total ordering achieved by choosing an

arbitrary ordering for processes

  • A ⇒ B iff
  • C(A) < C(B)
  • or, C(A) == C(B) and Pa < Pb
  • Remainder of paper shows how it all

works with physical clocks

32 Thursday, October 17, 13

slide-33
SLIDE 33

Languages and Distributed Systems

  • Handling concurrency in programming

languages

  • Handling distribution in programming

languages

  • What are the best primitives in such

languages?

33 Thursday, October 17, 13

slide-34
SLIDE 34

Communicating Sequential Processes

  • "Communicating Sequential Processes",

Tony Hoare, 1978

  • Defines "surprisingly versatile"

primitives for structuring concurrent programs

34 Thursday, October 17, 13

slide-35
SLIDE 35

Communicating Sequential Processes

  • Dijkstra's guarded commands for sequential

control structures

  • Dijkstra's parbegin to specify concurrent

execution

  • Input and output commands to allow

processes to communicate

  • Input commands in guards
  • Pattern matching on input messages

35 Thursday, October 17, 13

slide-36
SLIDE 36

Communicating Sequential Processes

  • The paper then uses these primitives to

show solutions to a variety of programming problems

36 Thursday, October 17, 13

slide-37
SLIDE 37

Primitives for Distributed Computing

  • "Primitives for Distributed Computing",

Barbara Liskov, 1979

  • Describes distributed computing

primitives, focusing on modularity and communication

37 Thursday, October 17, 13

slide-38
SLIDE 38

Primitives for Distributed Computing

  • Reasons for distributed computing:
  • To match distributed organizations
  • Reduced contention among organization

divisions

  • Speed of access
  • Physical control
  • Better availability, reliability, extensibility
  • But at the time there was little experience in

building distributed programs

38 Thursday, October 17, 13

slide-39
SLIDE 39

Primitives for Distributed Computing

  • Approach: extend the CLU language

with distributed computing primitives

  • CLU supported data abstractions and
  • bjects

Structured programming + Modularity Data abstraction

CLU ⇒

39 Thursday, October 17, 13

slide-40
SLIDE 40

Modularity

  • Guardian: a collection of processes and
  • bjects that provides a distributed service
  • Processes in same guardian share objects
  • Processes in different guardians

communicate only via messages

  • Each guardian bound to a single node
  • Each node can host multiple guardians

40 Thursday, October 17, 13

slide-41
SLIDE 41

Communication

  • No-wait send
  • Receive with timeout
  • Messages are commands with zero or more

arguments

  • Messages are sent to ports
  • Ports are named and typed, defining the

messages they accept

  • Guardians have one or more ports

41 Thursday, October 17, 13

slide-42
SLIDE 42

Primitives for Distributed Computing

  • Paper finishes by showing an example
  • f a transactional airline reservation

system

  • One conclusion: more experience with

distributed programming needed

42 Thursday, October 17, 13

slide-43
SLIDE 43

App-to-App Protocols

  • ARPANET, forerunner of the Internet,

started operating in late 1969

  • Early host-to-host protocols facilitated

human-to-computer communications

  • Email in 1971
  • FTP and interoperable Telnet in 1973
  • Interest started growing in application-to-

application protocols

43 Thursday, October 17, 13

slide-44
SLIDE 44

RFC 707

  • J. E. White, RFC 707, “A High-Level

Framework for Network-Based Resource Sharing, 1975

  • Expressed concerns that programmers
  • didn't know how to write distributed

applications

  • but did know how to write libraries
  • so, why not make distributed programming

look just like library programming?

44 Thursday, October 17, 13

slide-45
SLIDE 45

RFC 674 and 707

  • RFCs 674 and 707 basically define what

would become the remote procedure call (RPC)

  • But questioned in RFC 684:

"While the procedure call may be an appropriate basis for certain applications, we believe that it can neither directly nor accurately model the interactions and control structures that occur in many distributed multi-computer systems."

—R. Schantz, RFC 684

45 Thursday, October 17, 13

slide-46
SLIDE 46

See Also

  • Carl Hewitt's Actor Model
  • Smalltalk-72
  • Robin Milner's Calculus of

Communicating Systems

  • Bruce Lindsay's "Notes on Distributed

Databases"

  • Anything by Jim Gray (in any decade)

46 Thursday, October 17, 13

slide-47
SLIDE 47

1980s

47 Thursday, October 17, 13

slide-48
SLIDE 48

Some 1980s Issues

  • Languages for distributed

programming

  • Operating systems
  • Safety and liveness
  • Consensus

48 Thursday, October 17, 13

slide-49
SLIDE 49

Languages for Distribution

  • Much research in this period focused on

whole programming languages and runtimes

  • Even whole systems consisting of

unified programming language, compiler, and operating system

49 Thursday, October 17, 13

slide-50
SLIDE 50

Languages for Distribution

  • RPC was a key abstraction
  • Significant focus on uniformity:
  • local/remote transparency
  • location transparency
  • strong/static typing across the system

50 Thursday, October 17, 13

slide-51
SLIDE 51

Languages for Distribution

  • Specialized, closed protocols
  • Protocols were rarely the focus of

research efforts, publications almost never mentioned them

  • Protocol was viewed as part of the

RPC “black box,” hidden between client and server RPC stubs

51 Thursday, October 17, 13

slide-52
SLIDE 52

Liskov's Argus

  • Integrated programming language and

system

  • Extension of CLU
  • Designed to help with reliability issues

(partitions, downed nodes)

  • Included atomic actions to support

consistency

52 Thursday, October 17, 13

slide-53
SLIDE 53

Xerox Cedar Programming Environment

  • Gave us 1984 Birrell/Nelson paper

"Implementing Remote Procedure Calls"

53 Thursday, October 17, 13

slide-54
SLIDE 54

Interface Definition Language (IDL)

  • Declarative language used to define

remote interface functions and types

  • Translated/mapped into specific

programming language stubs and type definitions

  • There are many IDLs, not sure of the
  • riginal

54 Thursday, October 17, 13

slide-55
SLIDE 55

IDL

  • In mid-80s Apollo Computer used an IDL to

define system interfaces, then translated into C and Domain Pascal

  • Kept definitions for C and Pascal in sync
  • Apollo Network Computing System (NCS)

also used the IDL to define remote interfaces

  • NCS was a forerunner of the Distributed

Computing Environment (DCE)

55 Thursday, October 17, 13

slide-56
SLIDE 56

Some Apollo Trivia

  • Apollo Aegis and Domain/OS provided a native

networked file system (not bolted on later)

  • Access to a file on host "foo" from any other host:

//foo/path/to/file

  • Sir Tim Berners-Lee told me he later borrowed

the Apollo "//" to use in URLs

  • Microsoft Universal Naming Convention (UNC)

path uses "\\", likely due to Paul Leach who left HP/Apollo for Microsoft in 1991

56 Thursday, October 17, 13

slide-57
SLIDE 57

Emerald

  • distributed RPC-based object language
  • local/remote transparency
  • object mobility

57 Thursday, October 17, 13

slide-58
SLIDE 58

Erlang

  • Programming language/system invented in

the mid-80s at Ericsson by Joe Armstrong

  • Provides reliability via concurrency and

distribution

  • Useful features, reasonable trade-offs, clear

influence from work preceding it

  • Open source, available at erlang.org
  • My favorite programming language

58 Thursday, October 17, 13

slide-59
SLIDE 59

Vector Clocks

  • Independently discovered by Mattern

and Fidge

  • Instead of just transmitting clocks or

timestamps, keep a vector of clocks, one for each process

  • Lamport timestamps can't prove

causality, vector clocks can

59 Thursday, October 17, 13

slide-60
SLIDE 60

Vector Clocks

from Fidge "Timestamps in Message Passing Systems That Preserve the Partial Ordering"

60 Thursday, October 17, 13

slide-61
SLIDE 61

Consensus

  • Coordination and reliability
  • getting processes to agree
  • even if some are faulty or unavailable
  • or even if some are malicious

61 Thursday, October 17, 13

slide-62
SLIDE 62

Byzantine Generals

  • Lamport, 1982
  • Proves how to achieve consensus in the

presence of malicious processes

62 Thursday, October 17, 13

slide-63
SLIDE 63

FLP Impossibility

  • Fischer, Lynch, Paterson paper, 1983
  • Proves that in asynchronous systems,

reaching consensus in bounded time can be impossible with just one fault

  • Uses proof by contradiction
  • See also Nancy Lynch's "A Hundred

Impossibility Proofs for Distributed Computing"

63 Thursday, October 17, 13

slide-64
SLIDE 64

Why Impossibility?

"What good are impossibility results, anyway? They don't seem very useful at first... Most obviously, impossibility results tell you when you should stop trying to devise or improve an algorithm."

—Nancy Lynch

http://groups.csail.mit.edu/tds/papers/Lynch/podc89.pdf

64 Thursday, October 17, 13

slide-65
SLIDE 65

Safety and Liveness

  • Lamport, "Proving the Correctness of

Multiprocess Programs", 1977

  • See also Alpern and Schneider,

"Recognizing Safety and Liveness", 1987 and their prior related work

  • These properties help us reason about

distributed systems designs, approaches, trade-offs

65 Thursday, October 17, 13

slide-66
SLIDE 66

Safety and Liveness

  • Safety: nothing bad happens
  • e.g. distributed transactions ensure

consistency across a system

  • In consensus terms, only a single

proposed value is chosen

  • Doing nothing is considered safe!

66 Thursday, October 17, 13

slide-67
SLIDE 67

Safety and Liveness

  • Liveness: something good eventually

happens

  • e.g., a system eventually responds to

every request

  • In consensus terms, a proposed value is

eventually chosen

  • Ensures system progress

67 Thursday, October 17, 13

slide-68
SLIDE 68

See Also

  • Dwork, Lynch, Stockmeyer, "Consensus

in the Presence of Partial Synchrony" 1988

  • Oki's and Liskov's Viewstamped

Replication work for high availability

68 Thursday, October 17, 13

slide-69
SLIDE 69

See Also

  • Ken Birman's work on reliable distributed

computing (Isis, Horus)

  • Andrew Black's Eden project, a full

distributed OO operating system, RPC-based

  • Andrew Herbert's "Advanced Network

Systems Architecture" (ANSA), models and rules for distributed systems designs. Objects, transactions, interfaces. Influenced the Object Management Group (OMG)

69 Thursday, October 17, 13

slide-70
SLIDE 70

1990s

70 Thursday, October 17, 13

slide-71
SLIDE 71

Some 1990s Issues

  • Distributed objects
  • Practical consensus
  • The rise of the web

71 Thursday, October 17, 13

slide-72
SLIDE 72

Distributed Objects

  • OOP grew in popularity in the 70s and

80s

  • Many 80s distributed systems research

systems were based on objects

  • But research systems were often full

stacks, including OS, language, and compiler

72 Thursday, October 17, 13

slide-73
SLIDE 73

Distributed Objects

  • Computer vendors ultimately had little

choice but to

  • incorporate distributed systems research

into their own stacks

  • but make distributed programming

features available for “normal” programming languages, without changing those languages

  • It all led to CORBA

73 Thursday, October 17, 13

slide-74
SLIDE 74

Common Object request Broker Architecture

  • First CORBA spec

published 1991

  • I co-authored this

1999 book

  • CORBA is still alive

today in 2013, and this book still sells

74 Thursday, October 17, 13

slide-75
SLIDE 75

Ideal Distributed Objects Architecture

AI = Application Interfaces CF = Common Facilities DI = Domain Interfaces OS = Object Services

AI DI OS DI CF CF OS OS CF OS OS CORBA ORB

Example: Object Management Architecture (OMA)

from the Object Management Group (OMG)

75 Thursday, October 17, 13

slide-76
SLIDE 76

Enterprise Integration Reality

76 Thursday, October 17, 13

slide-77
SLIDE 77
  • A "common bus" was/is an interesting

but impractical goal

  • Even today in 2013, application

integration still involves numerous approaches

Application Integration

77 Thursday, October 17, 13

slide-78
SLIDE 78

Fallacies of Distributed Computing

  • 1. The network is

reliable.

  • 2. Latency is zero.
  • 3. Bandwidth is

infinite.

  • 4. The network is

secure.

  • 5. Topology doesn't

change.

  • 6. There is one

administrator.

  • 7. Transport cost is

zero.

  • 8. The network is

homogeneous.

78 Thursday, October 17, 13

slide-79
SLIDE 79

Revised Fallacies of Distributed Computing

1. Partitions do not

  • ccur.

2. Latency is zero. 3. Bandwidth is infinite. 4. The network is secure. 5. Topology doesn't change. 6. There is one administrator. 7. Transport cost is zero. 8. The network is homogeneous. 9. Clocks are synchronized.

  • 10. Concurrency can be

ignored.

79 Thursday, October 17, 13

slide-80
SLIDE 80

Distributed Systems Concurrency

  • Dist Sys are inherently concurrent
  • Yet the distributed objects movement

largely ignored concurrent object access

  • there was an OMG concurrency

control service, but used only for the distributed transaction service

  • also ignored consensus, again except

for the transaction service

80 Thursday, October 17, 13

slide-81
SLIDE 81

A Note on Distributed Computing

  • 1994 paper by Jim Waldo, Geoff Wyant,

Ann Wollrath, Sam Kendall

  • Addresses a prevalent issue of the era:

that local/remote transparency was a desirable goal

  • Also mentions the concurrency issue

81 Thursday, October 17, 13

slide-82
SLIDE 82

Paxos

  • Lamport's "The Part-Time Parliament"

defines the Paxos algorithm, still widely used today

  • Paper originally submitted in 1990,

panned by reviewers

  • But others recognized its significance,

so Lamport finally resubmitted for publication in 1998

82 Thursday, October 17, 13

slide-83
SLIDE 83

Implementing Consensus

  • Butler Lampson understood the

importance of Paxos

  • Published "How to Build a Highly

Available System Using Consensus" in 1996

  • Practical advice on using Paxos in real

systems

83 Thursday, October 17, 13

slide-84
SLIDE 84

See Also

  • F. Schneider's "Implementing Fault

Tolerant Services Using the State Machine Approach: A Tutorial", 1990

  • Karger et al. paper on Consistent

Hashing, 1997

  • A. Fox and E.A. Brewer, "Harvest, Yield,

and Scalable Tolerant Systems", 1999

84 Thursday, October 17, 13

slide-85
SLIDE 85

2000s

85 Thursday, October 17, 13

slide-86
SLIDE 86
  • REST
  • Paxos made simple
  • CAP
  • Dynamo

Some 2000s Issues

86 Thursday, October 17, 13

slide-87
SLIDE 87
  • "Representational State Transfer",

defined by Roy Fielding in his doctoral thesis, 2000 based on his work on the web and HTTP

  • An architectural style for networked

applications

  • Has unfortunately now become an
  • verused & misunderstood buzzword

REST

87 Thursday, October 17, 13

slide-88
SLIDE 88

REST Constraints

  • Constraints and trade-offs are what help

define an architecture

  • Client-server
  • Statelessness
  • Caching
  • Layered system
  • Uniform interface
  • Code on demand

88 Thursday, October 17, 13

slide-89
SLIDE 89

REST Properties

  • REST imposes these constraints to induce

desired properties such as

  • performance, scalability, portability,

simplicity

  • visibility (monitoring and mediation)
  • modifiability (evolution, extension, reuse)
  • reliability (handling failure, failover, load

balancing, redundancy)

89 Thursday, October 17, 13

slide-90
SLIDE 90

REST

  • REST works well for problems that fit its

constraints

  • But it's not for everything
  • A great general lesson from REST:
  • 1. understand the properties your apps

need

  • 2. impose the appropriate constraints/

trade-offs to get them

90 Thursday, October 17, 13

slide-91
SLIDE 91

Paxos Made Simple

  • Lamport was tired of complaints of the

complexity of Paxos, so he wrote this in 2001

  • It's still complex

91 Thursday, October 17, 13

slide-92
SLIDE 92

CAP Theorem

  • Eric Brewer's Consistency, Availability,

Partition Tolerance conjecture, 2000

  • Formally proven in 2002 by Gilbert and

Lynch

  • Common interpretation "pick two" isn't

quite right

92 Thursday, October 17, 13

slide-93
SLIDE 93

CAP Theorem

  • Distributed systems fail
  • So, partition tolerance (P) isn't a choice
  • Under partition, does your system
  • try to be consistent (C)
  • try to be available (A)
  • it can't be both

93 Thursday, October 17, 13

slide-94
SLIDE 94

Amazon Dynamo

  • 2007 paper from Amazon describing a

large-scale highly-available eventually consistent key-value datastore

  • Strong influence on Cassandra, Riak,

and Voldemort databases

  • Riak Core is a framework for

implementing Dynamo-like systems https://github.com/basho/riak_core

94 Thursday, October 17, 13

slide-95
SLIDE 95

See Also

  • The Google Chubby lock service
  • Joe Armstrong's 2003 PhD thesis "Making

reliable distributed systems in the presence of software errors"

  • Pastry and Chord distributed hash tables,

2001

  • Multicore CPU cache coherence protocols
  • Papers that have won the Dijkstra Prize

95 Thursday, October 17, 13

slide-96
SLIDE 96

2010s

96 Thursday, October 17, 13

slide-97
SLIDE 97

CRDTs

  • Replicated data types that handle updates

correctly in eventually consistent, highly available systems

  • E.g., counters, sets, maps
  • Automatic handling updates that can occur

concurrently or under partition

Convergent Commutative Conflict-free Replicated Data Types

97 Thursday, October 17, 13

slide-98
SLIDE 98

Raft

  • D. Ongaro and J. Ousterhout, "In Search
  • f an Understandable Consensus

Algorithm", 2013

  • See also Zookeeper Atomic Broadcast

(ZAB)

98 Thursday, October 17, 13

slide-99
SLIDE 99

CALM and Bloom

  • CALM: Consistency as Logical

Monotonicity

  • Bloom: distributed programming

language that helps deal with distributed consistency

99 Thursday, October 17, 13

slide-100
SLIDE 100
  • In general, pay attention to ALL

distributed systems work coming from:

  • Joe Hellerstein, Neil Conway, Peter

Alvaro, William Marczak from the UC Berkeley Database Group

  • Peter Bailis and Ali Ghodsi from

AMPLab at UC Berkeley

100 Thursday, October 17, 13

slide-101
SLIDE 101

Jepsen

  • Kyle Kingsbury (@aphyr) breaks

systems using Jepsen https:// github.com/aphyr/jepsen

  • Read his "Call Me Maybe" series of blog

posts on aphyr.com describing his experiments, lots of detailed distributed systems knowledge and insights

101 Thursday, October 17, 13

slide-102
SLIDE 102

Summary

102 Thursday, October 17, 13

slide-103
SLIDE 103
  • Distributed systems are everywhere,

and many developers work on them

103 Thursday, October 17, 13

slide-104
SLIDE 104
  • Distributed systems are hard to reason

about, due to many subtle details

104 Thursday, October 17, 13

slide-105
SLIDE 105
  • Distributed systems R&D history is

extremely rich

105 Thursday, October 17, 13

slide-106
SLIDE 106
  • Study the classics
  • These papers can be hard to understand

but are worth it

106 Thursday, October 17, 13

slide-107
SLIDE 107
  • Learning this history gives you a

vocabulary of theory and techniques for tackling the problems you work on

107 Thursday, October 17, 13

slide-108
SLIDE 108
  • Understand what trade-offs are best for

your distributed system

  • Achieve them with the help of relevant

prior work

108 Thursday, October 17, 13

slide-109
SLIDE 109

Thanks

@stevevinoski

109 Thursday, October 17, 13

slide-110
SLIDE 110
  • E.W. Dijkstra, "Cooperating Sequential Processes", 1965 http://www.cs.utexas.edu/~EWD/

transcriptions/EWD01xx/EWD123.html

  • E.W. Dijkstra, "The Structure of the 'THE' Multiprogramming System", CACM, May 1968

http://dl.acm.org/citation.cfm?id=363143

  • Akkoyunlu, Bernstein, Schantz, "Interprocess Communication Facilities for Network

Operating Systems", 1974 http://ieeexplore.ieee.org/xpl/articleDetails.jsp? arnumber=6323582

  • B. Cosell,, P. Johnson, J. Malman, R. Schantz, J. Sussman, R. Thomas, D. Walden, "An

Operational System for Computer Resource Sharing", 1975 http://www.webstart.com/ papers/tenex-rsexec.pdf

  • L. Lamport, "Time, Clocks and Ordering of Events in a Distributed System", 1978, http://

research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf

  • C.A.R. Hoare, "Communicating Sequential Processes", 1978 http://dl.acm.org/citation.cfm?

id=359585 see also http://www.usingcsp.com

  • B. Liskov, "Primitives for Distributed Computing", 1975 http://dl.acm.org/citation.cfm?

doid=800215.806567

References

110 Thursday, October 17, 13

slide-111
SLIDE 111

References

  • J. E. White, RFC 707, “A High-Level Framework for Network-Based Resource Sharing, 1975

http://tools.ietf.org/rfc/rfc707

  • R. Schantz, RFC 684, "A Commentary on Procedure Calling as a Network Protocol", 1975

http://tools.ietf.org/html/rfc684

  • Butler W. Lampson and Howard E. Sturgis, "Crash Recovery in a Distributed Data Storage

System", 1979 http://research.microsoft.com/en-us/um/people/blampson/21- crashrecovery/Abstract.html (originally part of the Distributed Processing Workshop, Brown University, August 1976)

  • L. Lamport, "Proving the Correctness of Multiprocess Programs", 1977 http://

research.microsoft.com/en-US/um/people/Lamport/pubs/proving.pdf

  • L. Lamport, "The Implementation of Reliable Distributed Multiprocess Systems", 1978 http://

research.microsoft.com/en-us/um/people/lamport/pubs/implementation.pdf

  • B. Lindsay et al., "Notes on Distributed Databases", 1979 http://domino.research.ibm.com/

library/cyberdig.nsf/papers/A776EC17FC2FCE73852579F100578964/$File/RJ2571.pdf

111 Thursday, October 17, 13

slide-112
SLIDE 112

References

  • M. Pease, R. Shostak, L. Lamport, "Reaching Agreement in the Presence of Faults", 1980 http://

research.microsoft.com/en-us/um/people/lamport/pubs/reaching.pdf

  • L. Lamport, R. Shostak, M. Pease, "The Byzantine Generals Problem", 1982 (available at) http://

www.cs.cornell.edu/courses/cs614/2004sp/papers/lsp82.pdf

  • B. Liskov and R. Scheifler, "Guardians and Actions: Linguistic Support for Robust, Distributed

Programs", 1983 (available at) http://www.cs.brandeis.edu/~cs147a/papers/liskov-argus.pdf

  • M. Fischer, N. Lynch, and M. Paterson, "Impossibility of Distributed Consensus with One Faulty

Process", 1983 http://groups.csail.mit.edu/tds/papers/Lynch/pods83-flp.pdf

  • A. Black et al., "Object Structure in the Emerald System", 1986 (available at) http://pdf.aminer.org/

000/521/744/object_structure_in_the_emerald_system.pdf

  • B. Alpern and F. Schneider, "Recognizing Safety and Liveness", 1987 http://www.cs.cornell.edu/

fbs/publications/RecSafeLive.pdf

  • C. Fidge, "Timestamps in Message-Passing Systems That Preserve the Partial Ordering", 1988,

(available at) http://zoo.cs.yale.edu/classes/cs426/2012/lab/bib/fidge88timestamps.pdf

  • F. Mattern, "Virtual Time and and Global States of Distributed Systems", 1989 (available at) http://

homes.cs.washington.edu/~arvind/cs425/doc/mattern89virtual.pdf

112 Thursday, October 17, 13

slide-113
SLIDE 113

References

  • C. Dwork, N. Lynch, L. Stockmeyer, "Consensus in the Presence of Partial Synchrony", 1988

http://groups.csail.mit.edu/tds/papers/Lynch/jacm88.pdf

  • B. Oki and B. Liskov, "Viewstamped Replication: A New Primary Copy Method to Support

Highly-Available Distributed Systems", 1988 (available at) http://www.cs.princeton.edu/ courses/archive/fall09/cos518/papers/viewstamped.pdf

  • N. Lynch, "A Hundred Impossibility Proofs for Distributed Computing", 1989 http://

groups.csail.mit.edu/tds/papers/Lynch/podc89.pdf

  • M. Kong, T. H. Dineen, P. J. Leach, E. A. Martin, N. W. Mishkin, J. N. Pato, and G. L. Wyant.
  • 1990. Network Computing System Reference Manual. Prentice-Hall, Inc., Upper Saddle River, NJ.
  • F. Schneider, "Implementing Fault Tolerant Services Using the State Machine Approach: A

Tutorial", 1990 http://www.cs.cornell.edu/fbs/publications/smsurvey.pdf

  • J. Waldo et al., "A Note on Distributed Computing", 1994 (available at) http://

www.cc.gatech.edu/classes/AY2010/cs4210_fall/papers/smli_tr-94-29.pdf

  • B. Lampson, "How to Build a Highly Available System Using Consensus", 1996 http://

research.microsoft.com/en-us/um/people/blampson/58-Consensus/Acrobat.pdf

113 Thursday, October 17, 13

slide-114
SLIDE 114

References

  • D. Karger, et al., "Consistent Hashing and Random Trees: Distributed Caching Protocols for

Relieving Hot Spots on the World Wide Web", 1997 http://dl.acm.org/citation.cfm? id=258660

  • A. Fox and E.A. Brewer, "Harvest, Yield, and Scalable Tolerant Systems", 1999, http://

lab.mscs.mu.edu/Dist2012/lectures/HarvestYield.pdf

  • R.T. Fielding, "Architectural Styles and the Design of Network-based Software Architectures",

2000 http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm

  • E. A. Brewer, "Towards Robust Distributed Systems", 2000 http://www.cs.berkeley.edu/

~brewer/cs262b-2004/PODC-keynote.pdf

  • L. Lamport, "Paxos Made Simple", 2001 http://research.microsoft.com/en-us/um/people/

lamport/pubs/paxos-simple.pdf

  • A. Rowstron and P. Druschel, "Pastry: Scalable, Decentralized Object Location and Routing

for Large-Scale Peer-to-Peer Systems", 2001 http://research.microsoft.com/en-us/um/ people/antr/PAST/pastry.pdf

114 Thursday, October 17, 13

slide-115
SLIDE 115

References

  • I. Stoica et al., "Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications", 2001

http://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf

  • S. Gilbert and N. Lynch, "Brewer's Conjecture and the Feasibility of Consistent, Available,

Partition-Tolerant Web Services", 2002 http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture- SigAct.pdf

  • J. Armstrong, "Making Reliable Distributed Systems in the Presence of Software Errors", 2003

http://www.erlang.org/download/armstrong_thesis_2003.pdf

  • M. Burrows, "The Chubby Lock Service for Loosely-coupled Distributed Systems", 2006

http://research.google.com/archive/chubby.html

  • G. DeCandia et al., "Dynamo: Amazon's Highly Available Key-value Store", 2007 http://

www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf

  • M. Shapiro et al., "A Comprehensive Study of Convergent and Commutative Replicated Data

Types", 2011 http://hal.inria.fr/docs/00/55/55/88/PDF/techreport.pdf

115 Thursday, October 17, 13

slide-116
SLIDE 116

References

  • M. Shapiro et al., "Conflict-Free Replicated Data Types", 2011 http://hal.inria.fr/docs/

00/60/93/99/PDF/RR-7687.pdf

  • E. Brewer, "CAP Twelve Years Later: How the 'Rules' Have Changed", 2012, http://

www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed

  • P. Bailis, A. Ghodsi, "Eventual Consistency Today: Limitations, Extensions, and Beyond", 2013

http://queue.acm.org/detail.cfm?id=2462076

  • P. Alvaro et al., "Consistency Analysis in Bloom: a CALM and Collected Approach", 2011

http://db.cs.berkeley.edu/papers/cidr11-bloom.pdf

  • N. Conway, et al., "Logic and Lattices for Distributed Programming", 2012 http://

db.cs.berkeley.edu/papers/socc12-blooml.pdf

  • D. Ongaro and J. Ousterhout, "In Search of an Understandable Consensus Algorithm", 2013

https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf

  • Zookeeper Atomic Broadcast, http://labs.yahoo.com/files/ladis08.pdf
  • Dotted Version Vector Sets, https://github.com/ricardobcl/Dotted-Version-Vectors

116 Thursday, October 17, 13