 
              Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science steen@cs.vu.nl Chapter 12: Distributed Web-Based Systems Version: December 10, 2012
Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence The WWW is a huge client-server system with millions of servers; each server hosting thousands of hyperlinked documents. Documents are often represented in text (plain text, HTML, XML) Alternative types: images, audio, video, applications (PDF, PS) Documents may contain scripts, executed by client-side software 2. Server fetches Client machine Server machine document from local file Browser Web server OS 3. Response 1. Get document request (HTTP) 2 / 19
Distributed Web-Based Systems 12.1 Architecture Multi-tiered architectures Observation Already very soon, Web sites were organized into three tiers. 3. Start process to fetch document 1. Get request 4. Database interaction HTTP� CGI� request� program handler 6. Return result 5. HTML document� created Web server CGI process Database server 3 / 19
Distributed Web-Based Systems 12.1 Architecture Web services Observation At a certain point, people started recognizing that it is was more than just user ↔ site interaction: sites could offer services to other sites ⇒ standardization is then badly needed. Client machine Server machine Look up� a service Client� Server� Publish service application application Stub Stub SOAP Communication� Communication� subsystem subsystem Generate stub� Generate stub� from WSDL� from WSDL� description description Service description (WSDL) Service description (WSDL) Service description (WSDL) Directory service (UDDI) 4 / 19
Distributed Web-Based Systems 12.2 Processes Apache Web server Observation: More than 52% of all 185 million Web sites are Apache. The server is internally organized more or less according to the steps needed to process an HTTP request. Module Module Module Function ... ... ... Link between� function and hook Hook Hook Hook Hook Apache core Functions called per hook Request Response 5 / 19
Distributed Web-Based Systems 12.2 Processes Server clusters Essence To improve performance and availability, WWW servers are often clustered in a way that is transparent to clients. Web Web Web Web server server server server LAN Front end handles Front all incoming requests end and outgoing responses Request Response 6 / 19
Distributed Web-Based Systems 12.2 Processes Server clusters Problem The front end may easily get overloaded, so that special measures need to be taken. Transport-layer switching: Front end simply passes the TCP request to one of the servers, taking some performance metric into account. Content-aware distribution: Front end reads the content of the HTTP request and then selects the best server. 7 / 19
Distributed Web-Based Systems 12.2 Processes Server Clusters Question Why can content-aware distribution be so much better? 6. Server responses Web 5. Forward server 3. Hand of f other TCP connection messages Distributor Other messages Dis- Client Switch 4. Inform patcher switch Setup request Distributor 1. Pass setup request 2. Dispatcher selects to a distributor server Web server 8 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication Web proxy caching Basic idea Sites install a separate proxy server that handles all outgoing requests. Proxies subsequently cache incoming documents. Cache-consistency protocols: Always verify validity by contacting server Age-based consistency: T expire = α · ( T cached − T last modified )+ T cached 9 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication Web proxy caching Basic idea (cnt’d) Cooperative caching, by which you first check your neighbors on a cache miss Web server 3. Forward request to Web server 1. Look in local cache Web 2. Ask neighboring proxy caches Web proxy proxy Cache Cache Client Client Client Client Client Client Web proxy HTTP Get request Cache Client Client Client 10 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication Replication in Web hosting systems Observation By-and-large, Web hosting systems are adopting replication to increase performance. Much research is done to improve their organization. Follows the lines of self-managing systems. Uncontrollable parameters (disturbance / noise) Initial configuration Corrections Observed output Web hosting system +/- +/- +/- Reference input Replica� Consistency� Request� Metric� placement enforcement routing estimation Analysis Measured output Adjustment triggers 11 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication Handling flash crowds Observation We need dynamic adjustment to balance resource usage. Flash crowds introduce a serious problem. 2 days 2 days (b) (a) 6 days 2.5 days (c) (d) 12 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication Server replication Content Delivery Network CDNs act as Web hosting services to replicate documents across the Internet providing their customers guarantees on high availability and performance (example: Akamai). 6. Get embedded documents� (if not already cached) CDN� Cache server 5. Get embedded� documents Return IP address� 7. Embedded documents client-best server 1. Get base document CDN DNS� Origin� 4 Client server server 2. Document with refs� to embedded documents DNS lookups 3 Regular� DNS system 13 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication Replication of Web applications Observation Replication becomes more difficult when dealing with databses and such. No single best solution. Assumption Updates are carried out at origin server, and propagated to edge servers. 14 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication Replication of Web applications: normal Edge-server�side Origin-server�side Client query Web Web server server response Appl Appl logic logic Content-blind Database cache copy full/partial�data�replication Content-aware Authoritative full�schema�replication/ cache database query�templates Schema Schema 15 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication Replication of Web applications Alternative solutions Full replication: high read/write ratio, often in combination with complex queries. Partial replication: high read/write ratio, but in combination with simple queries Content-aware caching: Check for queries at local database, and subscribe for invalidations at the server. Works good with range queries and complex queries. Content-blind caching: Simply cache the result of previous queries. Works great with simple queries that address unique results (e.g., no range queries). Question What can be said about replication vs. performance? 16 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication Replication Web apps.: full/partial replication Edge-server�side Origin-server�side Client query Web Web server server response Appl Appl logic logic Content-blind Database cache copy full/partial�data�replication Content-aware Authoritative full�schema�replication/ cache database query�templates Schema Schema 17 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication Replication Web apps.: content-aware caching Edge-server�side Origin-server�side Client query Web Web server server response Appl Appl logic logic Content-blind Database cache copy full/partial�data�replication Content-aware Authoritative full�schema�replication/ cache database query�templates Schema Schema 18 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication Replication Web apps.: content-blind caching Edge-server�side Origin-server�side Client query Web Web server server response Appl Appl logic logic Content-blind Database cache copy full/partial�data�replication Content-aware Authoritative full�schema�replication/ cache database query�templates Schema Schema 19 / 19
Recommend
More recommend