MC714: Sistemas Distribu dos Prof. Lucas Wanner Instituto de - - PowerPoint PPT Presentation

mc714 sistemas distribu dos
SMART_READER_LITE
LIVE PREVIEW

MC714: Sistemas Distribu dos Prof. Lucas Wanner Instituto de - - PowerPoint PPT Presentation

MC714: Sistemas Distribu dos Prof. Lucas Wanner Instituto de Computac ao, Unicamp Aula 2: Arquiteturas de sistemas distribu dos Revision: Distribution Transparency Transp. Description Access Hide differences in data


slide-1
SLIDE 1

MC714: Sistemas Distribu´ ıdos

  • Prof. Lucas Wanner

Instituto de Computac ¸ ˜ ao, Unicamp

Aula 2: Arquiteturas de sistemas distribu´ ıdos

slide-2
SLIDE 2

Revision: Distribution Transparency

Transp. Description Access Hide differences in data representation and invocation mechanisms Location Hide where an object is located Relocation Hide that an object may be moved to another location while in use Migration Hide that an object may move to another location Replication Hide that an object is replicated Concurrency Hide that an object may be shared by several independent users Failure Hide failure and possible recovery of an object

Note Distribution transparency is a nice a goal, but achieving it is a different story.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 2 / 30

slide-3
SLIDE 3

Revision: Techniques for Scaling

Hide communication latencies Avoid waiting for responses; do something else: Make use of asynchronous communication Have separate handler for incoming response Problem: not every application fits this model Distribution Partition data and computations across multiple machines: Move computations to clients (Java applets) Decentralized naming services (DNS) Decentralized information systems (WWW)

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 3 / 30

slide-4
SLIDE 4

Revision: Techniques for Scaling

Replication/caching Make copies of data available at different machines: Replicated file servers and databases Mirrored Web sites Web caches (in browsers and proxies) File caching (at server and client) Observation Applying scaling techniques is easy, except for one thing: Having multiple copies (cached or replicated), leads to inconsistencies: modifying

  • ne copy makes that copy different from the rest.

Always keeping copies consistent and in a general way requires global synchronization on each modification. Global synchronization precludes large-scale solutions.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 4 / 30

slide-5
SLIDE 5

Revision: Transactions

Model A transaction is a collection of operations on the state of an object (database, object composition, etc.) that satisfies the following properties (ACID) Atomicity: All operations either succeed, or all of them fail. When the transaction fails, the state of the object will remain unaffected by the transaction. Consistency: A transaction establishes a valid state transition. This does not exclude the possibility of invalid, intermediate states during the transaction’s execution. Isolation: Concurrent transactions do not interfere with each other. It appears to each transaction T that other transactions occur either before T, or after T, but never both. Durability: After the execution of a transaction, its effects are made permanent: changes to the state survive failures.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 5 / 30

slide-6
SLIDE 6

Revis˜ ao: Exerc´ ıcios

1

Defina e compare sistemas distribu´ ıdos e sistemas paralelos.

2

Qual ´ e o papel de um middleware em sistemas distribu´ ıdos?

3

Dˆ e exemplos e defina diferentes tipos de transparˆ encia de distribuic ¸ ˜ ao.

4

Qual ´ e a diferenc ¸a entre transparˆ encia de migrac ¸ ˜ ao e transparˆ encia de relocac ¸ ˜ ao?

5

Defina escalabilidade. Quais t´ ecnicas s˜ ao usadas para atingir escalabilidade?

6

Qual ´ e a diferenc ¸a entre replicac ¸ ˜ ao e caching?

7

A vis˜ ao tradicional de transac ¸ ˜

  • es diz que quando uma transac

¸ ˜ ao ´ e abortada, ´ e como se a transac ¸ ˜ ao nunca tivesse acontecido. Dˆ e um exemplo onde isto n˜ ao ´ e verdade.

8

Qual ´ e o papel de um coordenador de transac ¸ ˜

  • es?

6 / 30

slide-7
SLIDE 7

Programa

  • pico

Cap´ ıtulo Introduc ¸ ˜ ao e Fundamentos 1 Arquituras de sistemas distribu´ ıdos 2 Processos e Threads Revis˜ ao, 3 Clientes/Servidores, Virtualizac ¸ ˜ ao e N´ uvem 3 Comunicac ¸ ˜ ao: Revis˜ ao, Sockets Revis˜ ao, 4 Troca de Mensagens, Multicast 4 Disseminac ¸ ˜ ao de informac ¸ ˜ ao 4 Remote Procedure Call 4 Nomeac ¸ ˜ ao 5 Sincronizac ¸ ˜ ao de rel´

  • gio

6 Rel´

  • gios L´
  • gicos

6 Exclus˜ ao m´ utua 6 Eleic ¸ ˜ ao de l´ ıder 6

7 / 30

slide-8
SLIDE 8

Exerc´ ıcios

1

Se um cliente e um servidor est˜ ao afastados, a latˆ encia de rede pode dominar o

  • desempenho. Como podemos tratar este problema?

2

Qual ´ e a diferenc ¸a entre distribuic ¸ ˜ ao horizontal e vertical?

3

Considere um sistema distribu´ ıdo verticalmente em n processos P1,P2,...,Pn. O processo Pi ´ e cliente do processo Pi+1, e Pi retorna uma resposta para Pi−1 somente ap´

  • s receber uma resposta de Pi+1. Quais s˜

ao os fatores limitantes do desempenho do processo P1?

4

Dˆ e um exemplo de uso de um interceptador em um middleware.

5

Carros modernos s˜ ao equipados com m´ ultiplos componentes eletrˆ

  • nicos. Dˆ

e exemplos de controle por feedback em carros.

8 / 30

slide-9
SLIDE 9

Exerc´ ıcios

4

Considere uma rede de overlay com N n´

  • s em que cada n´
  • seleciona c vizinhos

aleatoriamente.

1

Se P e Q s˜ ao vizinhos de R, qual ´ e a probabilidade de serem vizinhos entre si?

2

Para procurar um arquivo, um n´

  • envia um pedido por inundac

¸ ˜ ao para todos os seus vizinhos, e solicita que estes enviem o pedido para os seus respectivos vizinhos mais uma vez. Dˆ e um limite superior para o n´ umero de n´

  • s que ser˜

ao alcanc ¸ados.

5

Considere um sistema BitTorrent onde cada n´

  • tem capacidade de sa´

ıda Bout, e capacidade de entrada Bin, com Bin > Bout. Alguns n´

  • s (chamados seeds) oferecem

arquivos para download por outros n´

  • s. Considere que h´

a S seeds e N clientes. Qual ´ e a capacidade m´ axima de download de um cliente:

1

Se cada cliente pode contactar somente um seed por vez?

2

Se, al´ em dos seeds, os clientes podem trocar pedac ¸os de arquivos (chunks) entre si, com uma taxa 1:1 de entrada:sa´ ıda?

9 / 30

slide-10
SLIDE 10

Architectures

Architectural styles Software architectures Architectures versus middleware Self-management in distributed systems

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 10 / 30

slide-11
SLIDE 11

Architectural styles

Basic idea Organize into logically different components, and distribute those components over the various machines.

Layer N Layer N-1 Layer 1 Layer 2 Request flow Response flow (a) (b) Object Object Object Object Object Method call

(a) Layered style is used for client-server system (b) Object-based style for distributed object systems.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 11 / 30

slide-12
SLIDE 12

Architectural Styles

Observation Decoupling processes in space (“anonymous”, referential coupling) and also time (“asynchronous”) has led to alternative styles.

(a) (b) Component Component Component Event bus Publish Publish Event delivery Component Component Data delivery Shared (persistent) data space

(a) Publish/subscribe [decoupled in space] (b) Shared dataspace [decoupled in space and time]

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 12 / 30

slide-13
SLIDE 13

Centralized Architectures

Basic Client–Server Model Characteristics: There are processes offering services (servers) There are processes that use services (clients) Clients and servers can be on different machines Clients follow request/reply model wrt to using services

Client Request Reply Server Provide service Time Wait for result

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 13 / 30

slide-14
SLIDE 14

Application Layering

Traditional three-layered view User-interface layer contains units for an application’s user interface Processing layer contains the functions of an application, i.e. without specific data Data layer contains the data that a client wants to manipulate through the application components Observation This layering is found in many distributed information systems, using traditional database technology and accompanying applications.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 14 / 30

slide-15
SLIDE 15

Example of a three-tiered architecture

15 / 30

slide-16
SLIDE 16

Application Layering

Database with Web pages Query generator Ranking algorithm HTML generator User interface Keyword expression Database queries Web page titles with meta-information Ranked list

  • f page titles

HTML page containing list Processing level User-interface level Data level

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 16 / 30

slide-17
SLIDE 17

Multi-Tiered Architectures

Single-tiered: dumb terminal/mainframe configuration Two-tiered: client/single server configuration Three-tiered: each layer on separate machine Traditional two-tiered configurations:

User interface User interface User interface Application User interface Application User interface Application Database Application Application Application Database Database Database Database Database User interface (a) (b) (c) (d) (e) Client machine Server machine

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 17 / 30

slide-18
SLIDE 18

Decentralized Architectures

Observation In the last several years we have been seeing a tremendous growth in peer-to-peer systems. Structured P2P: nodes are organized following a specific distributed data structure Unstructured P2P: nodes have randomly selected neighbors Hybrid P2P: some nodes are appointed special functions in a well-organized fashion Note In virtually all cases, we are dealing with overlay networks: data is routed over connections setup between the nodes (cf. application-level multicasting)

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 18 / 30

slide-19
SLIDE 19

Structured P2P Systems

Basic idea Organize the nodes in a structured overlay network such as a logical ring, and make specific nodes responsible for services based only on their ID.

15 2 14 3 13 4 12 8 7 9 6 10 5 11 1 Actual node {2,3,4} {5,6,7} {8,9,10,11,12} {13,14,15} {0,1} Associated data keys

Note The system provides an operation LOOKUP(key) that will efficiently route the lookup request to the associated node.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 19 / 30

slide-20
SLIDE 20

Structured P2P Systems

Other example Organize nodes in a d-dimensional space and let every node take the responsibility for data in a specific region. When a node joins ⇒ split a region.

(0.2,0.8) (0.6,0.7) (0.9,0.9) (0.2,0.3) (0.7,0.2) (0.9,0.6)

(0,0) Keys associated with node at (0.6,0.7)

(0.2,0.8) (0.6,0.7) (0.9,0.9) (0.2,0.45) (0.7,0.2) (0.9,0.6) (0.2,0.15) (1,0) (0,1) (1,1)

Actual node

(a) (b)

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 20 / 30

slide-21
SLIDE 21

Unstructured P2P Systems

Observation Many unstructured P2P systems are organized as a random overlay: two nodes are linked with probability p. Observation We no longer can look up information deterministicaly, but will have to resort to searching Flooding: node u sends a lookup query to all its neighbors. A neighbor responds or forwards (floods) the request. There are many variations: Limited flooding (maximal number of forwarding) Probabilistic flooding (flooding only with a certain probability) Random walk: randomly select a neighbor v. If v has the answer, it replies, otherwise v selects one of its neighbors. Variation: parallel random walk. Works well with replicated data.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 21 / 30

slide-22
SLIDE 22

Superpeers

Observation Sometimes it helps to select a few nodes to do specific work: superpeer.

Superpeer Regular peer Superpeer network

Examples Peers maintaining an index (for search) Peers monitoring the state of the network Peers being able to setup connections

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 22 / 30

slide-23
SLIDE 23

Hybrid Architectures: Client-server combined with P2P

Example Edge-server architectures, which are often used for Content Delivery Networks

Edge server Core Internet Enterprise network ISP ISP Client Content provider

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 23 / 30

slide-24
SLIDE 24

Hybrid Architectures: C/S with P2P – BitTorrent

Node 1 Node 2 Node N .torrent file for F A BitTorrent Web page List of nodes storing F Web server File server Tracker Client node K out of N nodes Lookup(F)

  • Ref. to

file server

  • Ref. to

tracker

Basic idea Once a node has identified where to download a file from, it joins a swarm of downloaders who in parallel get file chunks from the source, but also distribute these chunks amongst each other.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 24 / 30

slide-25
SLIDE 25

Architectures versus Middleware

Problem In many cases, distributed systems/applications are developed according to a specific architectural style. The chosen style may not be optimal in all cases ⇒ need to (dynamically) adapt the behavior of the middleware. Interceptors Intercept the usual flow of control when invoking a remote object.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 25 / 30

slide-26
SLIDE 26

Interceptors

Client application

B.do_something(value) invoke(B, &do_something, value) send([B, "do_something", value])

Request-level interceptor Message-level interceptor Object middleware Local OS Application stub To object B Nonintercepted call Intercepted call

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 26 / 30

slide-27
SLIDE 27

Adaptive Middleware

Separation of concerns: Try to separate extra functionalities and later weave them together into a single implementation. Computational reflection: Let a program inspect itself at runtime and adapt/change its settings dynamically if necessary Component-based design: Organize a distributed application through components that can be dynamically replaced when needed ⇒ highly complex, also many intercomponent dependencies.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 27 / 30

slide-28
SLIDE 28

Self-managing Distributed Systems

Observation Distinction between system and software architectures blurs when automatic adaptivity needs to be taken into account: Self-configuration Self-managing Self-healing Self-optimizing Self-* Warning There is a lot of hype going on in this field of autonomic computing.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 28 / 30

slide-29
SLIDE 29

Feedback Control Model

Observation In many cases, self-* systems are organized as a feedback control system.

Core of distributed system Metric estimation Analysis Adjustment measures +/- +/- +/- Reference input Initial configuration Uncontrollable parameters (disturbance / noise) Observed output Measured output Adjustment triggers Corrections

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 29 / 30

slide-30
SLIDE 30

Example: Globule

Globule Collaborative CDN that analyzes traces to decide where replicas of Web content should be placed. Decisions are driven by a general cost model: cost = (w1 ×m1)+(w2 ×m2)+···+(wn ×mn)

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 30 / 30

slide-31
SLIDE 31

Example: Globule

Replica server Core Internet Enterprise network ISP ISP Client Origin server Client Client

Globule origin server collects traces and does what-if analysis by checking what would have happened if page P would have been placed at edge server S. Many strategies are evaluated, and the best one is chosen.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 31 / 30