[PPT] - Introduction to Distributed Systems Introduction to Distributed PowerPoint Presentation

SLIDE 1

Introduction to Distributed Systems Introduction to Distributed Systems

SLIDE 2

Outline Outline

about the course
relationship to other courses
the challenges of distributed systems
distributed services
*ility for distributed services
some basic problems and techniques

SLIDE 3

What is CPS 212 about? What is CPS 212 about?

What do I mean by “distributed information systems”?

distributed: a bunch of “computers” connected by “wires”
Nodes are (at least) semi-autonomous...

but run software to coordinate and share resources.

Information systems: focus on systems to store/access/share data and
perations on data, rather than on computing.

Move {data, computation} around and deliver it to the right places at the right times, safely and securely.

Information systems is more general than “relational databases”.

In this course, we view database systems as local components of larger distributed systems. (The topics also apply to building very large database systems.) We study database concurrency control and recovery, but not the relational model.

SLIDE 4

Why are you here? Why are you here?

You are a second-year (or later) CPS graduate student.
You have taken CPS 210 and CPS 214 and you want more.

familiarity with TCP/IP networking, threads, and file systems

Or: we have talked and we agreed that you should take the class.
You are comfortable with concurrent programming in Java.

(You want to do some Java programming labs.)

You want to prepare for R/D in this exciting and important area.

(You want to read about 15 papers and take some exams.)

You want to get started...

(You want to spend time tinkering with a related software artifact

f your choice, and doing a semester group project.)

SLIDE 5

Continuum of Distributed Systems Continuum of Distributed Systems

? ?

small fast big slow

LAN Global Internet Parallel Architectures CPS 221

high latency low bandwidth autonomous nodes unreliable network fear and distrust independent failures decentralized administration

Networks CPS 214

Issues: naming and sharing performance and scale resource management low latency high bandwidth secure, reliable interconnect no independent failures coordinated resources

Multiprocessors clusters

fast network trusting hosts coordinated slow network untrusting hosts autonomy

SLIDE 6

Assumptions About the Network Assumptions About the Network

Most of what we study in this class is at the session or presentation levels of the OSI “layer cake”. We assume properties of the transport and network layers:

uniform network address space (IP address, port)
best-effort delivery of messages of arbitrary size
reliable ordered stream communication (TCP)
flow control

The key issue is: how to use the network to build networked applications and services with the properties we want?

In practice, many critical structuring and performance issues do not permit us to draw so clean a line...but we’ll try.

SLIDE 7

The Challenges of Distributed Systems The Challenges of Distributed Systems

private communication over public networks

who sent it (authentication), did anyone change it, did anyone see it

building reliable systems from unreliable components

reliable communication over unreliable networks autonomous nodes can fail independently; a distributed system can “partly fail” Lamport’s characterization: “A distributed system is one in which the failure of a machine I’ve never heard of can prevent me from doing my work.”

location, location, location

Placing data and computation for effective resource sharing, and finding it again

nce you put it somewhere.
coordination and shared state

What should we (the system components) do and when should we do it? Once we’ve all done it, can we all agree on what we did and when?

SLIDE 8

The Importance of Authentication The Importance of Authentication

EMLX

This is a picture of a $2.5B move in the value of Emulex Corporation, in response to a fraudulent press release by short-sellers through InternetWire last Friday. The release was widely disseminated by news media as a statement from Emulex management, but media failed to authenticate it.

[reproduced from clearstation.com]

SLIDE 9

Broader Importance of Distributed Software Technology Broader Importance of Distributed Software Technology

Today, the global community depends increasingly on distributed information systems technologies. There are many other recent examples of high-profile meltdowns of systems for distributed information exchange.

denial-of-service attacks against Yahoo etc. (spring 00)
the Starr report melts the ‘net (fall 98)
stored credit card numbers stolen from CDNow.com (spring 00)

People were afraid to buy over the net at all just a few years ago!

Network Solutions DNS root server failure (fall 00)
MCI trunk drop interrupts Chicago Board of Exchange (summer 99)

These reflect the reshaping of business, government, and society brought by the global Internet and related software.

We have to “get it right”!

SLIDE 10

Services Services

request/response paradigm ==> client/server model examples:

Remote Procedure Call (RPC)

bject invocation, e.g., Remote Method Invocation (RMI)

HTTP device protocols (e.g., SCSI)

Is Napster a “service”? client/server vs. peer/peer

“Do A for me.” “OK, here’s your answer.” “Now do B.” “OK, here.” Client Server(s)

SLIDE 11

Challenges for Services: Challenges for Services: * *ility ility

We want our distributed applications to be useful, correct, and

secure. We also want reliability. Broadly, that means:
recoverability

Don’t lose data if a failure occurs.

availability

Don’t interrupt service if a failure occurs. (also survivability)

scalability

The system can grow effectively with the workload. See also: manageability, adaptibility, agility, performability.

These affect how/where we place functions and data in the network.

It turns out that there are many common problems and techniques that can be (mostly) “factored out” of applications and services. That is (mostly) what this course is about.

SLIDE 12

Failure Failure

Before we talk about recoverability and availability, we need to know what we mean by failure.

packet drop or packet delay

Is delay bounded or unbounded? How long must I wait? synchronous vs. asynchronous distributed systems

network partition

“split brain” syndrome

Byzantine failure

component behaves incorrectly or unexpectedly could be an attack that corrupts or replays messages

component fail-stop or halt

discard state, or recover with stale state (e.g., pause)? For now, assume fail-stop and use the term “node” as shorthand for “component”.

SLIDE 13

Recoverability Recoverability

Some basic assumptions:

Nodes have volatile and (optional) nonvolatile storage.
Volatile storage is fast, but its contents are discarded in a failure.

OS crash/restart, power failure, untimely process death

Nonvolatile (stable) storage is slow, but its contents survive failures
f components other than the storage device itself.

E.g., disk: high latency but also high bandwidth (if sequential) Low-latency nonvolatile storage exists. It is expensive but getting cheaper: NVRAM, Uninterruptible Power Supply (UPS), flash

memory, etc...these help keep things interesting.

Stability is never absolute: it is determined by probability of device

failure, often measured by “mean time between failure” (MTBF). How about backing up data in remote memory?

SLIDE 14

Memory and Stable Storage Memory and Stable Storage

volatile memory stable storage (home) Servers typically manage volatile memory as a cache over stable storage: database file system

bject store

tuple store Stable storage holds the authoritative copy of the data. Volatile memory acts as a write-back “scratch pad” for updates and cached data in active use.

How does low-latency stable storage change this picture?

SLIDE 15

The Key to Recoverability The Key to Recoverability

Software must manage the flow of data from volatile to nonvolatile storage so that:

stable storage updates are efficient;
recovered state preserves atomicity of groups of updates made

together state is internally consistent or self-consistent e.g., atomic transactions (later)

each node recovers necessary state after a failure

The right stuff has to get to the “disk” at the right time; a failure can occur at any time (even while recovering). How do you debug something like this?

We need some basic techniques for preserving state...

SLIDE 16

Logging Logging

volatile memory home image Key idea: supplement the home data image with a log of recent updates and/or events. append-only sequential access (faster) preserves order of log entries enables atomic commit with a single write Recover by traversing, e.g., “replaying”, the log. Logging is fundamental to database systems and

ther storage systems.

log

SLIDE 17

The Problem of Distributed Recovery The Problem of Distributed Recovery

In a distributed system, a recovered node’s state must also be consistent with the states of other nodes.

E.g., what if a recovered node has forgotten an important event that others have remembered?

A functioning node may need to respond to a peer’s recovery.

rebuild the state of the recovering node, and/or
discard local state, and/or
abort/restart operations/interactions in progress

e.g., two-phase commit protocol

How to know if a peer has failed and recovered?

SLIDE 18

Example: Session Verifier Example: Session Verifier

What if y == x? How to guarantee that y != x? What is the implication of re-executing A and B, and after C? Some uses: NFS V3 write commitment, RPC sessions, NFS V4 and DAFS (client).

ops...

“Do A for me.” “OK, my verifier is x.” “B” “x” “C” “OK, my verifier is y.” “A and B” “y” S S´

SLIDE 19

Availability Availability

The basic technique for achieving availability is replication.

replicate hardware components replicate functions replicate data replicate servers

e.g., primary/backup, hot standby, process pairs, etc.
e.g., RAID parity for available storage

Build decentralized systems that eliminate single points of failure.

If a component fails, select a replica and redirect requests there.

fail over

SLIDE 20

The Meaning of Scalability The Meaning of Scalability

Scalability is now part of the “enhanced standard litany” [Fox]; everybody claims their system is “scalable”. What does it really mean? cost capacity

marginal cost of capacity total cost of capacity

scalable unscalable How do we measure or validate claims of scalability?

Note: watch out for “hockey sticks”!

Pay as you go: expand capacity by spending more money, in proportion to the new capacity.

SLIDE 21

Scalability II: Manageability Scalability II: Manageability

Today, “cost” has a broader meaning than it once did:

growth in administrative overhead with capacity
no interruption of service to upgrade capacity

“24 * 7 * 365 * .9999”

vendor 5% staff 40% facility 5% 50%

vendor 40% staff 40% facility 20%

Old World New World Where does the money go?

[Borrowed from Jim Grey]

SLIDE 22

Scalability III: Adaptability, Agility, etc. Scalability III: Adaptability, Agility, etc.

This is a graph of request traffic to download the Starr Report on Pres. Clinton’s extracurricular pursuits, released in 9/98. Ideally, systems could self-organize and adapt to absorb workload bursts and/or to spread load to new resources.

SLIDE 23

Coordination Coordination

If the solution to availability and scalability is to replicate functions and data, how do we coordinate the replicas?

data consistency
update propagation
mutual exclusion
consistent global states
group membership
group communication
event ordering
distributed consensus
quorum consensus

SLIDE 24

Example: Time Example: Time

Can the nodes of a distributed system agree on what time it is?

This is necessary to impose a global ordering on events... ...which can ensure that nodes observe important events in the same order.

Distributed/decentralized systems have no central notion of time.

physical clocks (“wall clock”)

induces a total order, but must account for drift synchronous distributed systems assume a bound on drift synchronize clocks using e.g., Network Time Protocol

“logical” clocks
rder events that are logically related, i.e., one might have caused the
ther (potential causality); induces a partial order
vector clocks

stronger partial order: determine if e1 did or did not causally precede e2

SLIDE 25

A Brief History (viewed from 1998) A Brief History (viewed from 1998)

Dark Ages The Future

1960

ARPAnet Cedar Grapevine (Xerox PARC)

1970 1980 1990 1992 1995

DECnet SNA Ethernet LANs Berkeley Unix r* TCP/IP AppleTalk workstations RPC, NFS, NIS Apollo NSFnet V and Mach Emerald Argus ISIS AFS client/server DCE OMG Wintel PCs AOL IBM NT The Web Mosaic The Web The Web The Web distributed objects CORBA COM Java Java Java DES RSA DSM PVM/MPI Coda Tuxedo Encina Spring Network Objects search engines Viper Wolfpack Falcon, etc. ISPs Netscape commercial Internet Beans RMI servlets JINI ...