Distributed Systems Principles and Paradigms Maarten van Steen VU - - PowerPoint PPT Presentation

distributed systems principles and paradigms
SMART_READER_LITE
LIVE PREVIEW

Distributed Systems Principles and Paradigms Maarten van Steen VU - - PowerPoint PPT Presentation

Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science steen@cs.vu.nl Chapter 12: Distributed Web-Based Systems Version: December 10, 2012 Distributed Web-Based Systems 12.1 Architecture


slide-1
SLIDE 1

Distributed Systems Principles and Paradigms

Maarten van Steen

VU Amsterdam, Dept. Computer Science steen@cs.vu.nl

Chapter 12: Distributed Web-Based Systems

Version: December 10, 2012

slide-2
SLIDE 2

Distributed Web-Based Systems 12.1 Architecture

Distributed Web-based systems

Essence The WWW is a huge client-server system with millions of servers; each server hosting thousands of hyperlinked documents. Documents are often represented in text (plain text, HTML, XML) Alternative types: images, audio, video, applications (PDF, PS) Documents may contain scripts, executed by client-side software

Client machine Browser OS Server machine Web server

  • 1. Get document request (HTTP)
  • 3. Response
  • 2. Server fetches

document from local file

2 / 19

slide-3
SLIDE 3

Distributed Web-Based Systems 12.1 Architecture

Multi-tiered architectures

Observation Already very soon, Web sites were organized into three tiers.

Web server Database server CGI process CGI program

  • 1. Get request
  • 3. Start process to fetch document
  • 5. HTML document

created HTTP request handler

  • 6. Return result
  • 4. Database interaction

3 / 19

slide-4
SLIDE 4

Distributed Web-Based Systems 12.1 Architecture

Web services

Observation At a certain point, people started recognizing that it is was more than just user ↔ site interaction: sites could offer services to other sites ⇒ standardization is then badly needed.

Service description (WSDL) Client machine Client application Stub Server application Stub Communication subsystem Communication subsystem SOAP Service description (WSDL) Service description (WSDL) Directory service (UDDI) Publish service Look up a service Generate stub from WSDL description Server machine Generate stub from WSDL description

4 / 19

slide-5
SLIDE 5

Distributed Web-Based Systems 12.2 Processes

Apache Web server

Observation: More than 52% of all 185 million Web sites are Apache. The server is internally organized more or less according to the steps needed to process an HTTP request.

Hook Hook Hook Hook Function

... ... ...

Module Module Module Apache core Functions called per hook Link between function and hook Request Response

5 / 19

slide-6
SLIDE 6

Distributed Web-Based Systems 12.2 Processes

Server clusters

Essence To improve performance and availability, WWW servers are often clustered in a way that is transparent to clients.

Front end Web server Web server Web server Web server Request Response Front end handles all incoming requests and outgoing responses LAN

6 / 19

slide-7
SLIDE 7

Distributed Web-Based Systems 12.2 Processes

Server clusters

Problem The front end may easily get overloaded, so that special measures need to be taken. Transport-layer switching: Front end simply passes the TCP request to one of the servers, taking some performance metric into account. Content-aware distribution: Front end reads the content of the HTTP request and then selects the best server.

7 / 19

slide-8
SLIDE 8

Distributed Web-Based Systems 12.2 Processes

Server Clusters

Question Why can content-aware distribution be so much better?

Switch Client Web server Web server Distributor Distributor Dis- patcher

  • 1. Pass setup request

to a distributor

  • 2. Dispatcher selects

server

  • 3. Hand of

f TCP connection

  • 4. Inform

switch Setup request Other messages

  • 5. Forward
  • ther

messages

  • 6. Server responses

8 / 19

slide-9
SLIDE 9

Distributed Web-Based Systems 12.6 Consistency and Replication

Web proxy caching

Basic idea Sites install a separate proxy server that handles all outgoing requests. Proxies subsequently cache incoming documents. Cache-consistency protocols: Always verify validity by contacting server Age-based consistency: Texpire = α ·(Tcached −Tlast modified)+Tcached

9 / 19

slide-10
SLIDE 10

Distributed Web-Based Systems 12.6 Consistency and Replication

Web proxy caching

Basic idea (cnt’d) Cooperative caching, by which you first check your neighbors on a cache miss

Web proxy Web server Web proxy Web proxy Cache Cache Cache Client Client Client Client Client Client Client Client Client

  • 2. Ask neighboring proxy caches
  • 1. Look in

local cache HTTP Get request

  • 3. Forward request

to Web server

10 / 19

slide-11
SLIDE 11

Distributed Web-Based Systems 12.6 Consistency and Replication

Replication in Web hosting systems

Observation By-and-large, Web hosting systems are adopting replication to increase

  • performance. Much research is done to improve their organization. Follows

the lines of self-managing systems.

Web hosting system Metric estimation Analysis +/- +/- +/- Reference input Initial configuration Uncontrollable parameters (disturbance / noise) Observed output Measured output Adjustment triggers Corrections Replica placement Consistency enforcement Request routing

11 / 19

slide-12
SLIDE 12

Distributed Web-Based Systems 12.6 Consistency and Replication

Handling flash crowds

Observation We need dynamic adjustment to balance resource usage. Flash crowds introduce a serious problem.

(a) (b) (c) (d) 2 days 2 days 6 days 2.5 days 12 / 19

slide-13
SLIDE 13

Distributed Web-Based Systems 12.6 Consistency and Replication

Server replication

Content Delivery Network CDNs act as Web hosting services to replicate documents across the Internet providing their customers guarantees on high availability and performance (example: Akamai).

Origin server Client CDN server CDN DNS server Regular DNS system Cache

  • 1. Get base document
  • 2. Document with refs

to embedded documents

  • 6. Get embedded documents

(if not already cached)

  • 5. Get embedded

documents

  • 7. Embedded documents

Return IP address client-best server DNS lookups 3 4

13 / 19

slide-14
SLIDE 14

Distributed Web-Based Systems 12.6 Consistency and Replication

Replication of Web applications

Observation Replication becomes more difficult when dealing with databses and

  • such. No single best solution.

Assumption Updates are carried out at origin server, and propagated to edge servers.

14 / 19

slide-15
SLIDE 15

Distributed Web-Based Systems 12.6 Consistency and Replication

Replication of Web applications: normal

Appl logic Appl logic Authoritative database Schema Schema Web server Web server query response full/partialdatareplication fullschemareplication/ querytemplates Content-aware cache Database copy Edge-serverside Origin-serverside Content-blind cache Client

15 / 19

slide-16
SLIDE 16

Distributed Web-Based Systems 12.6 Consistency and Replication

Replication of Web applications

Alternative solutions Full replication: high read/write ratio, often in combination with complex queries. Partial replication: high read/write ratio, but in combination with simple queries Content-aware caching: Check for queries at local database, and subscribe for invalidations at the server. Works good with range queries and complex queries. Content-blind caching: Simply cache the result of previous queries. Works great with simple queries that address unique results (e.g., no range queries). Question What can be said about replication vs. performance?

16 / 19

slide-17
SLIDE 17

Distributed Web-Based Systems 12.6 Consistency and Replication

Replication Web apps.: full/partial replication

Appl logic Schema Web server response full/partialdatareplication fullschemareplication/ querytemplates Content-blind cache Content-aware cache Database copy Client Edge-serverside Authoritative database Schema Web server query Origin-serverside Appl logic

17 / 19

slide-18
SLIDE 18

Distributed Web-Based Systems 12.6 Consistency and Replication

Replication Web apps.: content-aware caching

Appl logic Schema Web server response full/partialdatareplication fullschemareplication/ querytemplates Content-blind cache Content-aware cache Database copy Client Edge-serverside Authoritative database Schema Web server query Origin-serverside Appl logic

18 / 19

slide-19
SLIDE 19

Distributed Web-Based Systems 12.6 Consistency and Replication

Replication Web apps.: content-blind caching

Appl logic Schema Web server response full/partialdatareplication fullschemareplication/ querytemplates Content-blind cache Content-aware cache Database copy Client Edge-serverside Authoritative database Schema Web server query Origin-serverside Appl logic

19 / 19