Distributed Systems Principles and Paradigms Maarten van Steen VU - - PowerPoint PPT Presentation
Distributed Systems Principles and Paradigms Maarten van Steen VU - - PowerPoint PPT Presentation
Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science steen@cs.vu.nl Chapter 12: Distributed Web-Based Systems Version: December 10, 2012 Distributed Web-Based Systems 12.1 Architecture
Distributed Web-Based Systems 12.1 Architecture
Distributed Web-based systems
Essence The WWW is a huge client-server system with millions of servers; each server hosting thousands of hyperlinked documents. Documents are often represented in text (plain text, HTML, XML) Alternative types: images, audio, video, applications (PDF, PS) Documents may contain scripts, executed by client-side software
Client machine Browser OS Server machine Web server
- 1. Get document request (HTTP)
- 3. Response
- 2. Server fetches
document from local file
2 / 19
Distributed Web-Based Systems 12.1 Architecture
Multi-tiered architectures
Observation Already very soon, Web sites were organized into three tiers.
Web server Database server CGI process CGI program
- 1. Get request
- 3. Start process to fetch document
- 5. HTML document
created HTTP request handler
- 6. Return result
- 4. Database interaction
3 / 19
Distributed Web-Based Systems 12.1 Architecture
Web services
Observation At a certain point, people started recognizing that it is was more than just user ↔ site interaction: sites could offer services to other sites ⇒ standardization is then badly needed.
Service description (WSDL) Client machine Client application Stub Server application Stub Communication subsystem Communication subsystem SOAP Service description (WSDL) Service description (WSDL) Directory service (UDDI) Publish service Look up a service Generate stub from WSDL description Server machine Generate stub from WSDL description
4 / 19
Distributed Web-Based Systems 12.2 Processes
Apache Web server
Observation: More than 52% of all 185 million Web sites are Apache. The server is internally organized more or less according to the steps needed to process an HTTP request.
Hook Hook Hook Hook Function
... ... ...
Module Module Module Apache core Functions called per hook Link between function and hook Request Response
5 / 19
Distributed Web-Based Systems 12.2 Processes
Server clusters
Essence To improve performance and availability, WWW servers are often clustered in a way that is transparent to clients.
Front end Web server Web server Web server Web server Request Response Front end handles all incoming requests and outgoing responses LAN
6 / 19
Distributed Web-Based Systems 12.2 Processes
Server clusters
Problem The front end may easily get overloaded, so that special measures need to be taken. Transport-layer switching: Front end simply passes the TCP request to one of the servers, taking some performance metric into account. Content-aware distribution: Front end reads the content of the HTTP request and then selects the best server.
7 / 19
Distributed Web-Based Systems 12.2 Processes
Server Clusters
Question Why can content-aware distribution be so much better?
Switch Client Web server Web server Distributor Distributor Dis- patcher
- 1. Pass setup request
to a distributor
- 2. Dispatcher selects
server
- 3. Hand of
f TCP connection
- 4. Inform
switch Setup request Other messages
- 5. Forward
- ther
messages
- 6. Server responses
8 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Web proxy caching
Basic idea Sites install a separate proxy server that handles all outgoing requests. Proxies subsequently cache incoming documents. Cache-consistency protocols: Always verify validity by contacting server Age-based consistency: Texpire = α ·(Tcached −Tlast modified)+Tcached
9 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Web proxy caching
Basic idea (cnt’d) Cooperative caching, by which you first check your neighbors on a cache miss
Web proxy Web server Web proxy Web proxy Cache Cache Cache Client Client Client Client Client Client Client Client Client
- 2. Ask neighboring proxy caches
- 1. Look in
local cache HTTP Get request
- 3. Forward request
to Web server
10 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication in Web hosting systems
Observation By-and-large, Web hosting systems are adopting replication to increase
- performance. Much research is done to improve their organization. Follows
the lines of self-managing systems.
Web hosting system Metric estimation Analysis +/- +/- +/- Reference input Initial configuration Uncontrollable parameters (disturbance / noise) Observed output Measured output Adjustment triggers Corrections Replica placement Consistency enforcement Request routing
11 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Handling flash crowds
Observation We need dynamic adjustment to balance resource usage. Flash crowds introduce a serious problem.
(a) (b) (c) (d) 2 days 2 days 6 days 2.5 days 12 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Server replication
Content Delivery Network CDNs act as Web hosting services to replicate documents across the Internet providing their customers guarantees on high availability and performance (example: Akamai).
Origin server Client CDN server CDN DNS server Regular DNS system Cache
- 1. Get base document
- 2. Document with refs
to embedded documents
- 6. Get embedded documents
(if not already cached)
- 5. Get embedded
documents
- 7. Embedded documents
Return IP address client-best server DNS lookups 3 4
13 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication of Web applications
Observation Replication becomes more difficult when dealing with databses and
- such. No single best solution.
Assumption Updates are carried out at origin server, and propagated to edge servers.
14 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication of Web applications: normal
Appl logic Appl logic Authoritative database Schema Schema Web server Web server query response full/partialdatareplication fullschemareplication/ querytemplates Content-aware cache Database copy Edge-serverside Origin-serverside Content-blind cache Client
15 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication of Web applications
Alternative solutions Full replication: high read/write ratio, often in combination with complex queries. Partial replication: high read/write ratio, but in combination with simple queries Content-aware caching: Check for queries at local database, and subscribe for invalidations at the server. Works good with range queries and complex queries. Content-blind caching: Simply cache the result of previous queries. Works great with simple queries that address unique results (e.g., no range queries). Question What can be said about replication vs. performance?
16 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication Web apps.: full/partial replication
Appl logic Schema Web server response full/partialdatareplication fullschemareplication/ querytemplates Content-blind cache Content-aware cache Database copy Client Edge-serverside Authoritative database Schema Web server query Origin-serverside Appl logic
17 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication Web apps.: content-aware caching
Appl logic Schema Web server response full/partialdatareplication fullschemareplication/ querytemplates Content-blind cache Content-aware cache Database copy Client Edge-serverside Authoritative database Schema Web server query Origin-serverside Appl logic
18 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication Web apps.: content-blind caching
Appl logic Schema Web server response full/partialdatareplication fullschemareplication/ querytemplates Content-blind cache Content-aware cache Database copy Client Edge-serverside Authoritative database Schema Web server query Origin-serverside Appl logic
19 / 19