Distributed Systems Principles and Paradigms Maarten van Steen VU - - PowerPoint PPT Presentation
Distributed Systems Principles and Paradigms Maarten van Steen VU - - PowerPoint PPT Presentation
Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 02: Architectures Version: September 3, 2012 Architectures Architectures Architectural styles Software
Architectures
Architectures
Architectural styles Software architectures Architectures versus middleware Self-management in distributed systems
2 / 29
Architectures 2.1 Architectural styles
Architectural styles
Basic idea Organize into logically different components, and distribute those components over the various machines.
Layer N Layer N-1 Layer 1 Layer 2 Request flow Response flow (a) (b) Object Object Object Object Object Method call
(a) Layered style is used for client-server system (b) Object-based style for distributed object systems.
3 / 29
Architectures 2.1 Architectural styles
Architectural Styles
Observation Decoupling processes in space (“anonymous”) and also time (“asynchronous”) has led to alternative styles.
Subscribe
Component Component Component Event bus Publish Notification delivery
Subscribe Data delivery Publish
Component Component Shared (persistent) data space
(a) (b) (a) Publish/subscribe [decoupled in space] (b) Shared dataspace [decoupled in space and time]
4 / 29
Architectures 2.2 System Architectures
Centralized Architectures
Basic Client–Server Model Characteristics: There are processes offering services (servers) There are processes that use services (clients) Clients and servers can be on different machines Clients follow request/reply model wrt to using services
Client Request Reply Server Provide service Time Wait for result
5 / 29
Architectures 2.2 System Architectures
Application Layering
Traditional three-layered view User-interface layer contains units for an application’s user interface Processing layer contains the functions of an application, i.e. without specific data Data layer contains the data that a client wants to manipulate through the application components Observation This layering is found in many distributed information systems, using traditional database technology and accompanying applications.
6 / 29
Architectures 2.2 System Architectures
Application Layering
Database with Web pages Query generator Ranking algorithm HTML generator User interface Keyword expression Database queries Web page titles with meta-information Ranked list
- f page titles
HTML page containing list Processing level User-interface level Data level
7 / 29
Architectures 2.2 System Architectures
Multi-Tiered Architectures
Single-tiered: dumb terminal/mainframe configuration Two-tiered: client/single server configuration Three-tiered: each layer on separate machine Traditional two-tiered configurations:
User interface User interface User interface Application User interface Application User interface Application Database Application Application Application Database Database Database Database Database User interface (a) (b) (c) (d) (e) Client machine Server machine
8 / 29
Architectures 2.2 System Architectures
Decentralized Architectures
Observation In the last couple of years we have been seeing a tremendous growth in peer-to-peer systems. Structured P2P: nodes are organized following a specific distributed data structure Unstructured P2P: nodes have randomly selected neighbors Hybrid P2P: some nodes are appointed special functions in a well-organized fashion Note In virtually all cases, we are dealing with overlay networks: data is routed over connections setup between the nodes (cf. application-level multicasting)
9 / 29
Architectures 2.2 System Architectures
Structured P2P Systems
Basic idea Organize the nodes in a structured overlay network such as a logical ring, or a hypercube, and make specific nodes responsible for services based only on their ID.
0000 1000 0100 1100 0001 1001 0101 1101 0010 1010 0110 1110 0011 1011 0111 1111
Note The system provides an operation LOOKUP(key) that will efficiently route the lookup request to the associated node.
10 / 29
Architectures 2.2 System Architectures
Unstructured P2P Systems
Essence Many unstructured P2P systems are organized as a random overlay: two nodes are linked with probability p. Observation We can no longer look up information deterministically, but will have to resort to searching: Flooding: node u sends a lookup query to all of its neighbors. A neighbor responds, or forwards (floods) the request. There are many variations: Limited flooding (maximal number of forwarding) Probabilistic flooding (flood only with a certain probability). Random walk: Randomly select a neighbor v. If v has the answer, it replies, otherwise v randomly selects one of its neighbors. Variation: parallel random walk. Works well with replicated data.
11 / 29
Architectures 2.2 System Architectures
Superpeers
Observation Sometimes it helps to select a few nodes to do specific work: superpeer.
Weak peer Super peer Overlay network of super peers
Examples Peers maintaining an index (for search) Peers monitoring the state of the network Peers being able to setup connections
12 / 29
Architectures 2.2 System Architectures
Hybrid Architectures: Client-server combined with P2P
Example Edge-server architectures, which are often used for Content Delivery Networks
Edge server Core Internet Enterprise network ISP ISP Client Content provider
13 / 29
Architectures 2.2 System Architectures
Hybrid Architectures: C/S with P2P – BitTorrent
Node 1 Node 2 Node N .torrent file for F A BitTorrent Web page List of nodes storing F Web server File server Tracker Client node K out of N nodes Lookup(F)
- Ref. to
file server
- Ref. to
tracker
Basic idea Once a node has identified where to download a file from, it joins a swarm of downloaders who in parallel get file chunks from the source, but also distribute these chunks amongst each other.
14 / 29
Architectures 2.3 Architectures versus Middleware
Architectures versus Middleware
Problem In many cases, distributed systems/applications are developed according to a specific architectural style. The chosen style may not be
- ptimal in all cases ⇒ need to (dynamically) adapt the behavior of the
middleware. Interceptors Intercept the usual flow of control when invoking a remote object.
15 / 29
Architectures 2.3 Architectures versus Middleware
Interceptors
Client application
B.do_something(value) invoke(B, &do_something, value) send([B, "do_something", value])
Request-level interceptor Message-level interceptor Object middleware Local OS Application stub To object B Nonintercepted call Intercepted call
16 / 29
Architectures 2.4 Self-management in Distributed Systems
Self-managing Distributed Systems
Observation Distinction between system and software architectures blurs when automatic adaptivity needs to be taken into account: Self-configuration Self-managing Self-healing Self-optimizing Self-* Warning There is a lot of hype going on in this field of autonomic computing.
17 / 29
Architectures 2.4 Self-management in Distributed Systems
Feedback Control Model
Observation In many cases, self-* systems are organized as a feedback control system.
Core of distributed system Metric estimation Analysis Adjustment measures +/- +/- +/- Reference input Initial configuration Uncontrollable parameters (disturbance / noise) Observed output Measured output Adjustment triggers Corrections
18 / 29
Architectures 2.4 Self-management in Distributed Systems
Example: Globule
Globule Collaborative CDN that analyzes traces to decide where replicas of Web content should be placed. Decisions are driven by a general cost model: cost = (w1 ×m1)+(w2 ×m2)+···+(wn ×mn)
19 / 29
Architectures 2.4 Self-management in Distributed Systems
Example: Globule
Replica server Core Internet Enterprise network ISP ISP Client Origin server Client Client
Globule origin server collects traces and does what-if analysis by checking what would have happened if page P would have been placed at edge server S. Many strategies are evaluated, and the best one is chosen.
20 / 29
Architectures Extra: Strategy evaluation in Globule
An experiment
Research question Does it make sense to distribute each Web page according to its own best strategy, instead of applying a single, overall distribution strategy to all Web pages?
Edge server Edge server Edge server Origin server Client Client Client Client Client Client Client Client Client Client Client Client Client Client Client Clients in an unknown AS AS 1 AS 2 AS 3 AS of document’s
- rigin server
21 / 29
Architectures Extra: Strategy evaluation in Globule
An experiment
We collected traces on requests and updates for all Web pages from two different servers (in Amsterdam and Erlangen) For each request, we checked:
From which autonomous system it came What the average delay was to that client What the average bandwidth was to the client’s AS (randomly taking 5 clients from that AS)
Pages that were requested less than 10 times were removed from the experiment. We replayed the trace file for many different system configurations, and many different distribution scenarios.
22 / 29
Architectures Extra: Strategy evaluation in Globule
An experiment
Issue Site 1 Site 2 Start date 13/9/1999 20/3/2000 End date 18/12/1999 11/9/2000 Duration (days) 96 175 Number of documents 33,266 22,637 Number of requests 4,858,369 1,599,777 Number of updates 11,612 3338 Number of ASes 2567 1480
23 / 29
Architectures Extra: Strategy evaluation in Globule
Distinguished strategies: Caching
Abbr. Name Description NR No replication No replication or caching takes place. All clients forward their requests directly to the
- rigin server.
CV Verification Edge servers cache documents. At each subsequent request, the origin server is contacted for revalidation. CLV Limited validity Edge servers cache documents. A cached document has an associated expire time before it becomes invalid and is removed from the cache. CDV Delayed verification Edge servers cache documents. A cached document has an associated expire time after which the origin server is contacted for revalidation.
24 / 29
Architectures Extra: Strategy evaluation in Globule
Distinguished strategies: Replication
Abbr. Name Description SI Server invalidation Edge servers cache documents, but the origin server invalidates cached copies when the document is updated. SUx Server updates The origin server maintains copies at the x most relevant edge servers; x = 10, 25 or 50 SU50 + CLV Hybrid SU50 & CLV The origin server maintains copies at the 50 most relevant edge servers; the other intermediate servers follow the CLV strategy. SU50 + CDV Hybrid SU50 & CDV The origin server maintains copies at the 50 most relevant edge servers; the other edge servers follow the CDV strategy.
25 / 29
Architectures Extra: Strategy evaluation in Globule
Trace results: One global strategy
Turnaround time (TaT) and bandwidth (BW) in relative measures; stale documents as fraction of total requested documents. Site 1 Site 2 Strategy TaT Stale docs BW TaT Stale docs BW NR 203 118 183 115 CV 227 113 190 100 CLV 182 0.0061 113 142 0.0060 100 CDV 182 0.0059 113 142 0.0057 100 SI 182 113 141 100 SU10 128 100 160 114 SU25 114 123 132 119 SU50 102 165 114 132 SU50+CLV 100 0.0011 165 100 0.0019 125 SU50+CDV 100 0.0011 165 100 0.0017 125
Conclusion: No single global strategy is best
26 / 29
Architectures Extra: Strategy evaluation in Globule
Assigning an optimal strategy per document: Site 1
Ideal arrangement SU50+CLV SU50+CDV SU50 SU25 CLV SI CDV Cost function arrangements Totalconsumedbandwidth Totalturnaroundtime
27 / 29
Architectures Extra: Strategy evaluation in Globule
Assigning an optimal strategy per document: Site 2
Idealarrangement SU50+CLV SU50+CDV SU50 SU25 SU10 CDV CLV SI Costfunctionarrangements Totalconsumedbandwidth Totalturnaroundtime
28 / 29
Architectures Extra: Strategy evaluation in Globule
Useful strategies
Fraction of documents to which a strategy is assigned.
Strategy Site 1 Site 2 NR 0.0973 0.0597 CV 0.0001 0.0000 CLV 0.0131 0.0029 CDV 0.0000 0.0000 SI 0.0089 0.0061 SU10 0.1321 0.6087 SU25 0.1615 0.1433 SU50 0.4620 0.1490 SU50+CLV 0.1232 0.0301 SU50+CDV 0.0017 0.0002
Conclusion: It makes sense to differentiate strategies
29 / 29