Distributed Systems Principles and Paradigms Chapter 12 (version - - PDF document

distributed systems
SMART_READER_LITE
LIVE PREVIEW

Distributed Systems Principles and Paradigms Chapter 12 (version - - PDF document

Distributed Systems Principles and Paradigms Chapter 12 (version October 15, 2007 ) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.20. Tel: (020) 598 7784


slide-1
SLIDE 1

Distributed Systems

Principles and Paradigms

Chapter 12

(version October 15, 2007)

Maarten van Steen

Vrije Universiteit Amsterdam, Faculty of Science

  • Dept. Mathematics and Computer Science

Room R4.20. Tel: (020) 598 7784 E-mail:steen@cs.vu.nl, URL: www.cs.vu.nl/∼steen/

01 Introduction 02 Architectures 03 Processes 04 Communication 05 Naming 06 Synchronization 07 Consistency and Replication 08 Fault Tolerance 09 Security 10 Distributed Object-Based Systems 11 Distributed File Systems 12 Distributed Web-Based Systems 13 Distributed Coordination-Based Systems

00 – 1 /

slide-2
SLIDE 2

Distributed Web-Based Systems

Essence: The WWW is a huge client-server system with millions of servers; each server hosting thousands

  • f hyperlinked documents:

Client machine Browser OS Server machine Web server

  • 1. Get document request (HTTP)
  • 3. Response
  • 2. Server fetches

document from local file

  • Documents are generally represented in text (plain

text, HTML, XML)

  • Alternative types: images, audio, video, but also

applications (PDF , PS)

  • Documents may contain scripts that are executed

by the client-side software

12 – 1 Distributed Web-Based Systems/12.1 Architecture

slide-3
SLIDE 3

Multi-tiered Architectures

Observation: Already very soon, Web sites were or- ganized into three tiers:

Web server Database server CGI process CGI program

  • 1. Get request
  • 3. Start process to fetch document
  • 5. HTML document

created HTTP request handler

  • 6. Return result
  • 4. Database interaction

12 – 2 Distributed Web-Based Systems/12.1 Architecture

slide-4
SLIDE 4

Web Services

Observation: At a certain point, people started rec-

  • gnizing that it is was more than just user ↔ site in-

teraction: sites could offer services to other sites ⇒ standardization is then badly needed.

Service description (WSDL) Client machine Client application Stub Server application Stub Communication subsystem Communication subsystem SOAP Service description (WSDL) Service description (WSDL) Directory service (UDDI) Publish service Look up a service Generate stub from WSDL description Server machine Generate stub from WSDL description

12 – 3 Distributed Web-Based Systems/12.1 Architecture

slide-5
SLIDE 5

Clients: Web browsers

Observation: browsers form the Web’s most impor- tant client-side sofware. They used to be simple, but that is long ago.

User interface Browser engine Rendering engine Network comm. HTML/XML parser Display back end Client-side script interpreter

12 – 4 Distributed Web-Based Systems/12.2 Processes

slide-6
SLIDE 6

Apache Web Server

Observation: More than 70% of all Web sites are based on Apache. The server is internally organized more or less according to the steps needed to process an HTTP request:

Hook Hook Hook Hook Function

... ... ...

Module Module Module Apache core Functions called per hook Link between function and hook Request Response

12 – 5 Distributed Web-Based Systems/12.2 Processes

slide-7
SLIDE 7

Server Clusters (1/2)

Essence: To improve performance and availability, WWW servers are often clustered in a way that is transparent to clients:

Front end Web server Web server Web server Web server Request Response Front end handles all incoming requests and outgoing responses LAN

Problem: The front end may easily get overloaded, so that special measures need to be taken. Transport-layer switching: Front end simply passes the TCP request to one of the servers, taking some performance metric into account. Content-aware distribution: Front end reads the con- tent of the HTTP request and then selects the best server.

12 – 6 Distributed Web-Based Systems/12.2 Processes

slide-8
SLIDE 8

Server Clusters (2/2)

Question: Why can content-aware distribution be so much better?

Switch Client Web server Web server Distributor Distributor Dis- patcher

  • 1. Pass setup request

to a distributor

  • 2. Dispatcher selects

server

  • 3. Hand of

f TCP connection

  • 4. Inform

switch Setup request Other messages

  • 5. Forward
  • ther

messages

  • 6. Server responses

12 – 7 Distributed Web-Based Systems/12.2 Processes

slide-9
SLIDE 9

Communication (1/2)

Essence: Communication in the Web is generally based

  • n HTTP; a relatively simple client-server transfer pro-

tocol having the following request messages:

Operation Description Head Request to return the header of a document Get Request to return a document to the client Put Request to store a document Post Provide data that are to be added to a docu- ment (collection) Delete Request to delete a document

12 – 8 Distributed Web-Based Systems/12.3 Communication

slide-10
SLIDE 10

Communication (2/2)

Header C/S Contents Accept C The type of documents the client can handle Accept-Charset C The character sets are acceptable for the client Accept- Encoding C The document encodings the client can handle Accept- Language C The natural language the client can handle Authorization C A list of the client’s credentials WWW- Authenticate S Security challenge the client should respond to Date C+S Date and time the message was sent ETag S The tags associated with the returned document Expires S The time for how long the response remains valid From C The client’s e-mail address Host C The TCP address of the document’s server If-Match C The tags the document should have If-None-Match C The tags the document should not have If-Modified- Since C Tells the server to return a document only if it has been modified since the specified time If-Unmodified- Since C Tells the server to return a document only if it has not been modified since the specified time Last-Modified S The time the returned document was last modified Location S A document reference to which the client should redirect its request Referer C Refers to client’s most recently requested document Upgrade C+S The application protocol sender wants to switch to Warning C+S Information about status of the data in the message 12 – 9 Distributed Web-Based Systems/12.3 Communication

slide-11
SLIDE 11

SOAP

Simple Object Access Protocol: Based on XML, this is the standard protocol for communication be- tween Web services.

  • SOAP is bound to an underlying protocol (i.e., it

is not independent from its carrier)

  • Conversational exchange style: Send a docu-

ment one way, get a filled-in response back.

  • RPC-style exchange: Used to invoke a Web ser-

vice.

12 – 10 Distributed Web-Based Systems/12.3 Communication

slide-12
SLIDE 12

A Note on XML

Observation: XML has the advantage of allowing self- describing documents. Full stop (i.e., it introduces performance problems and is not meant to be read by human beings)

env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope"> <env:Header> <n:alertcontrol xmlns:n="http://example.org/alertcontrol"> <n:priority>1</n:priority> <n:expires>2001-06-22T14:00:00-05:00</n:expires> </n:alertcontrol> </env:Header> <env:Body> <m:alert xmlns:m="http://example.org/alert"> <m:msg>Pick up Mary at school at 2pm</m:msg> </m:alert> </env:Body> </env:Envelope>

12 – 11 Distributed Web-Based Systems/12.3 Communication

slide-13
SLIDE 13

Naming: URL

URL: Uniform Resource Locator tells how and where to access a resource.

Scheme Host name Pathname Scheme Host name Port Pathname Scheme Host name Port Pathname http http http :// :// :// www.cs.vu.nl www.cs.vu.nl 130.37.24.11 : : 80 80 /home/steen/mbox /home/steen/mbox /home/steen/mbox (a) (b) (c)

Examples:

http HTTP http://www.cs.vu.nl:80/globe mailto Mail mailto:steen@cs.vu.nl ftp FTP ftp://ftp.cs.vu.nl/pub/minix/README file Local file file:/edu/book/work/chp/11/11 data Inline data data:text/plain;charset=iso-8859-7, %e1%e2%e3 telnet Remote login telnet://flits.cs.vu.nl tel Telephone tel:+31201234567 modem Modem modem:+31201234567;type=v32 12 – 12 Distributed Web-Based Systems/12.4

slide-14
SLIDE 14

Synchronization: WebDAV

Problem: There is a growing need for collaborative auditing of Web documents, but bare-bones HTTP can’t help here. Solution: Web Distributed Authoring and Versioning.

  • Supports exclusive and shared write locks, which
  • perate on entire documents
  • A lock is passed by means of a lock token; the

server registers the client(s) holding the lock

  • Clients modify the document locally and post it

back to the server along with the lock token Note: There is no specific support for crashed clients holding a lock.

12 – 13 Distributed Web-Based Systems/12.5 Synchronization

slide-15
SLIDE 15

Web Proxy Caching

Basic idea: Sites install a separate proxy server that handles all outgoing requests. Proxies subsequently cache incoming documents. Cache-consistency pro- tocols:

  • Always verify validity by contacting server
  • Age-based consistency:

Texpire = α · (Tcached − Tlast modi f ied) + Tcached

  • Cooperative caching, by which you first check your

neighbors on a cache miss:

Web proxy Web server Web proxy Web proxy Cache Cache Cache Client Client Client Client Client Client Client Client Client

  • 2. Ask neighboring proxy caches
  • 1. Look in

local cache HTTP Get request

  • 3. Forward request

to Web server

12 – 14 Distributed Web-Based Systems/12.6 Consistency and Replication

slide-16
SLIDE 16

Replication in Web Hosting Systems

Observation: By-and-large, Web hosting systems are adopting replication to increase performance. Much research is done to improve their organization. Fol- lows the lines of self-managing systems:

Web hosting system Metric estimation Analysis +/- +/- +/- Reference input Initial configuration Uncontrollable parameters (disturbance / noise) Observed output Measured output Adjustment triggers Corrections Replica placement Consistency enforcement Request routing

12 – 15 Distributed Web-Based Systems/12.6 Consistency and Replication

slide-17
SLIDE 17

Handling Flash Crowds

Observation: We need dynamic adjustment to bal- ance resource usage. Flash crowds introduce a se- rious problem:

(a) (b) (c) (d) 2 days 2 days 6 days 2.5 days

12 – 16 Distributed Web-Based Systems/12.6 Consistency and Replication

slide-18
SLIDE 18

Server Replication

Content Delivery Network: CDNs act as Web host- ing services to replicate documents across the Inter- net providing their customers guarantees on high avail- ability and performance (example: Akamai).

Origin server Client CDN server CDN DNS server Regular DNS system Cache

  • 1. Get base document
  • 2. Document with refs

to embedded documents

  • 6. Get embedded documents

(if not already cached)

  • 5. Get embedded

documents

  • 7. Embedded documents

Return IP address client-best server DNS lookups 3 4

Question: How would consistency be maintained in this system?

12 – 17 Distributed Web-Based Systems/12.6 Consistency and Replication

slide-19
SLIDE 19

Replication of Web Apps. (1/3)

Observation: Replication becomes more difficult when dealing with databses and such. No single best solu- tion.

Authoritative database Schema Schema Server Server query response full/partial data replication full schema replication/ query templates Content-blind cache Content-aware cache Database copy Client Edge-server side Origin-server side

Assumption: Updates are carried out at origin server, and propagated to edge servers.

12 – 18 Distributed Web-Based Systems/12.6 Consistency and Replication

slide-20
SLIDE 20

Replication of Web Apps. (2/3)

Authoritative database Schema Schema Server Server query response full/partial data replication full schema replication/ query templates Content-blind cache Content-aware cache Database copy Client Edge-server side Origin-server side

  • Full replication: high read/write ratio, often in

combination with complex queries. Note: replica- tion may possibly speed-down performance when R/W ratio goes down.

  • Partial replication: high read/write ratio, but in

combination with simple queries

12 – 19 Distributed Web-Based Systems/12.6 Consistency and Replication

slide-21
SLIDE 21

Replication of Web Apps. (3/3)

Authoritative database Schema Schema Server Server query response full/partial data replication full schema replication/ query templates Content-blind cache Content-aware cache Database copy Client Edge-server side Origin-server side

  • Content-aware caching: Check for queries at lo-

cal database, and subscribe for invalidations at the server. Works good with range queries and complex queries.

  • Content-blind caching: Simply cache the result
  • f previous queries. Works great with simple queries

that address unique results (e.g., no range queries).

12 – 20 Distributed Web-Based Systems/12.6 Consistency and Replication

slide-22
SLIDE 22

Security: TLS (SSL)

Transport Layer Security: Modern version of the the Secure Socket Layer (SSL), which “sits” between transport layer and application protocols. Relatively simple protocol that can support mutual authentica- tion using certificates:

Client Server [ K [ K

+ + S C CA CA

] ] ([ R ] C KS

+

) Possibilities Choices

1 2 3 4 5

12 – 21 Distributed Web-Based Systems/12.6 Consistency and Replication