Unicamp MC714 Distributed Systems Slides by Maarten van Steen, - - PowerPoint PPT Presentation

unicamp mc714
SMART_READER_LITE
LIVE PREVIEW

Unicamp MC714 Distributed Systems Slides by Maarten van Steen, - - PowerPoint PPT Presentation

Unicamp MC714 Distributed Systems Slides by Maarten van Steen, adapted from Distributed Systems, 3rd edition Chapter 04: Communication Revision: Revision: Threads and Distributed Systems Improve performance Starting a thread is typically


slide-1
SLIDE 1

Unicamp MC714

Distributed Systems

Slides by Maarten van Steen, adapted from Distributed Systems, 3rd edition

Chapter 04: Communication

slide-2
SLIDE 2

Revision:

Revision: Threads and Distributed Systems

Improve performance Starting a thread is typically much cheaper than starting a new process. Having a single-threaded server prohibits simple scale-up to a multiprocessor system. As with clients: hide network latency by reacting to next request while previous one is being replied. Better structure Most servers have high I/O demands. Using simple, well-understood blocking calls simplifies the overall structure. Multithreaded programs tend to be smaller and easier to understand due to simplified flow of control.

2 / 62

slide-3
SLIDE 3

Revision:

Revision: Ways of virtualization

(a) Process VM, (b) Native VMM, (c) Hosted VMM

Runtime system Application/Libraries Hardware Operating system Application/Libraries Virtual machine monitor Hardware Operating system Virtual machine monitor Application/Libraries Hardware Operating system Operating system

(a) (b) (c) Differences (a) Separate set of instructions, an interpreter/emulator, running atop an OS. (b) Low-level instructions, along with bare-bones minimal operating system (c) Low-level instructions, but delegating most work to a full-fledged OS.

3 / 62

slide-4
SLIDE 4

Revision:

Revision: Servers and state

Stateless servers Never keep accurate information about the status of a client after having handled a request: Don’t record whether a file has been opened (simply close it again after access) Don’t promise to invalidate a client’s cache Don’t keep track of your clients

4 / 62

slide-5
SLIDE 5

Revision:

Revision: Servers and state

Stateless servers Never keep accurate information about the status of a client after having handled a request: Don’t record whether a file has been opened (simply close it again after access) Don’t promise to invalidate a client’s cache Don’t keep track of your clients Consequences Clients and servers are completely independent State inconsistencies due to client or server crashes are reduced Possible loss of performance because, e.g., a server cannot anticipate client behavior (think of prefetching file blocks)

4 / 62

slide-6
SLIDE 6

Revision:

Revision: Servers and state

Stateless servers Never keep accurate information about the status of a client after having handled a request: Don’t record whether a file has been opened (simply close it again after access) Don’t promise to invalidate a client’s cache Don’t keep track of your clients Consequences Clients and servers are completely independent State inconsistencies due to client or server crashes are reduced Possible loss of performance because, e.g., a server cannot anticipate client behavior (think of prefetching file blocks) Question Does connection-oriented communication fit into a stateless design?

4 / 62

slide-7
SLIDE 7

Revision:

Revision: Server Clusters

Common organization

Logical switch (possibly multiple) Application/compute servers Distributed file/database system Client requests Dispatched request First tier Second tier Third tier

Crucial element The first tier is generally responsible for passing requests to an appropriate server: request dispatching

5 / 62

slide-8
SLIDE 8

Revision:

Revision: Request Handling

Observation Having the first tier handle all communication from/to the cluster may lead to a bottleneck. A solution: TCP handoff

Switch Client Server Server Request Request (handed off) Response Logically a single TCP connection

6 / 62

slide-9
SLIDE 9

Revision:

Models for code migration

Before execution After execution Client Server Client Server CS code exec resource code exec* resource REV code − → exec resource − → code exec* resource

CS: Client-Server REV: Remote evaluation

7 / 62

slide-10
SLIDE 10

Revision:

Models for code migration

Before execution After execution Client Server Client Server CoD exec resource ← − code code exec* resource ← − MA code exec resource − → resource resource − → code exec* resource

CoD: Code-on-demand MA: Mobile agents

8 / 62

slide-11
SLIDE 11

Revision:

Strong and weak mobility

Object components Code segment: contains the actual code Data segment: contains the state Execution state: contains context of thread executing the object’s code Weak mobility: Move only code and data segment (and reboot execution) Relatively simple, especially if code is portable Distinguish code shipping (push) from code fetching (pull) Strong mobility: Move component, including execution state Migration: move entire object from one machine to the other Cloning: start a clone, and set it in the same execution state.

9 / 62

slide-12
SLIDE 12

Revision:

Revis˜ ao: Exerc´ ıcios

1

Considere um servic ¸o que leva um total de 10 ms para atender um pedido desde que os dados necess´ arios estejam em uma cache na mem´

  • ria principal. Nos casos onde os dados n˜

ao est˜ ao na cache, uma operac ¸ ˜ ao de disco que leva 90 ms ´ e necessaria antes de completar o pedido, e durante este tempo a thread que processa o pedido ´ e suspensa. Assuma que os dados est˜ ao na cache para 50% dos pedidos. Quantos pedidos por segundo o servidor pode tratar se for implementado com uma ´ unica thread? E se o servidor usar m´ ultiplas threads?

2

Faz sentido limitar o n´ umero de threads em um processo servidor? Argumente.

3

Existem casos onde um servidor single-thread tem desempenho melhor do que um servidor multi-thread? Argumente.

10 / 62

slide-13
SLIDE 13

Revision:

Revis˜ ao: Exerc´ ıcios

4

Um servidor multi-processos tem algumas vantagens e desvantagens quando comparado com um servidor multi-threads. Dˆ e alguns exemplos.

5

Um servidor que mant´ em uma conex˜ ao TCP/IP para um cliente ´ e stateful ou stateless?

11 / 62

slide-14
SLIDE 14

Exercises:

Exerc´ ıcios

1

Descreva o processo de conex˜ ao entre cliente e servidor com sockets TCP/IP .

2

Diferencie comunicac ¸ ˜ ao s´ ıncrona e ass´ ıncrona, persistente e

  • transiente. Dˆ

e exemplos de cada combinac ¸ ˜ ao.

3

Descreva um problema de escalabilidade com comunicac ¸ ˜ ao s´ ıncrona transiente.

4

Qual ´ e o papel de um broker na comunicac ¸ ˜ ao orientada a mensagens?

12 / 62

slide-15
SLIDE 15

Exercises:

Exerc´ ıcios

5

Na Figura 4.35, qual ´ e o fator de stretch da rede de overlay na rota A→C?

6

Explique o princ´ ıpio de anti-entropia usado em protocolos epidˆ emicos.

7

Descreva o problema de remoc ¸ ˜ ao de dados em protocolos epidˆ emicos e apresente uma soluc ¸ ˜ ao.

8

Descreva um algoritmo epidˆ emico que calcule o tamanho de uma rede.

13 / 62

slide-16
SLIDE 16

Communication: Foundations Layered Protocols

Basic networking model

Physical Data link Network Transport Session Application Presentation Application protocol Presentation protocol Session protocol Transport protocol Network protocol Data link protocol Physical protocol Network 1 2 3 4 5 7 6

Drawbacks Focus on message-passing only Often unneeded or unwanted functionality Violates access transparency

The OSI reference model 14 / 62

slide-17
SLIDE 17

Communication: Foundations Layered Protocols

Low-level layers

Recap Physical layer: contains the specification and implementation of bits, and their transmission between sender and receiver Data link layer: prescribes the transmission of a series of bits into a frame to allow for error and flow control Network layer: describes how packets in a network of computers are to be routed. Observation For many distributed systems, the lowest-level interface is that of the network layer.

The OSI reference model 15 / 62

slide-18
SLIDE 18

Communication: Foundations Layered Protocols

Transport Layer

Important The transport layer provides the actual communication facilities for most distributed systems. Standard Internet protocols TCP: connection-oriented, reliable, stream-oriented communication UDP: unreliable (best-effort) datagram communication

The OSI reference model 16 / 62

slide-19
SLIDE 19

Communication: Foundations Layered Protocols

Middleware layer

Observation Middleware is invented to provide common services and protocols that can be used by many different applications A rich set of communication protocols (Un)marshaling of data, necessary for integrated systems Naming protocols, to allow easy sharing of resources Security protocols for secure communication Scaling mechanisms, such as for replication and caching Note What remains are truly application-specific protocols... such as?

Middleware protocols 17 / 62

slide-20
SLIDE 20

Communication: Foundations Layered Protocols

An adapted layering scheme

Hardware Middleware Application Application protocol Middleware protocol Host-to-host protocol Network Operating system Physical/Link-level protocol

Middleware protocols 18 / 62

slide-21
SLIDE 21

Communication: Foundations Types of Communication

Types of communication

Distinguish...

Client Server

  • Synchronize after

processing by server Synchronize at request delivery Synchronize at request submission Request Reply Storage facility Transmission interrupt Time

Transient versus persistent communication Asynchronous versus synchronous communication

19 / 62

slide-22
SLIDE 22

Communication: Foundations Types of Communication

Types of communication

Transient versus persistent

Client Server

  • Synchronize after

processing by server Synchronize at request delivery Synchronize at request submission Request Reply Storage facility Transmission interrupt Time

Transient communication: Comm. server discards message when it cannot be delivered at the next server, or at the receiver. Persistent communication: A message is stored at a communication server as long as it takes to deliver it.

20 / 62

slide-23
SLIDE 23

Communication: Foundations Types of Communication

Types of communication

Places for synchronization

Client Server

  • Synchronize after

processing by server Synchronize at request delivery Synchronize at request submission Request Reply Storage facility Transmission interrupt Time

At request submission At request delivery After request processing

21 / 62

slide-24
SLIDE 24

Communication: Foundations Types of Communication

Client/Server

Some observations Client/Server computing is generally based on a model of transient synchronous communication: Client and server have to be active at time of communication Client issues request and blocks until it receives reply Server essentially waits only for incoming requests, and subsequently processes them

22 / 62

slide-25
SLIDE 25

Communication: Foundations Types of Communication

Client/Server

Some observations Client/Server computing is generally based on a model of transient synchronous communication: Client and server have to be active at time of communication Client issues request and blocks until it receives reply Server essentially waits only for incoming requests, and subsequently processes them Drawbacks synchronous communication Client cannot do any other work while waiting for reply Failures have to be handled immediately: the client is waiting The model may simply not be appropriate (mail, news)

22 / 62

slide-26
SLIDE 26

Communication: Message-oriented communication Simple transient messaging with sockets

Transient messaging: sockets

Berkeley socket interface

Operation Description socket Create a new communication end point bind Attach a local address to a socket listen Tell operating system what the maximum number of pending connection requests should be accept Block caller until a connection request arrives connect Actively attempt to establish a connection send Send some data over the connection receive Receive some data over the connection close Release the connection

connect socket socket bind listen receive receive send send accept close close Server Client Synchronization point Communication

23 / 62

slide-27
SLIDE 27

Communication: Message-oriented communication Simple transient messaging with sockets

Sockets: Python code

Server

1 from socket

import *

2 s = socket(AF_INET

, SOCK_STREAM)

3 s.bind((HOST

, PORT))

4 s.listen

(1)

5 (conn

, addr) = s.accept () # returns new socket and addr. client

6 while True:

# forever

7

data = conn.recv (1024) # receive data from client

8

if not data: break # stop if client stopped

9

conn.send(str(data)+"*") # return sent data plus an "*"

10 conn.close()

# close the connection

Client

1 from socket

import *

2 s = socket(AF_INET

, SOCK_STREAM)

3 s.connect

((HOST , PORT)) # connect to server (block until accepted)

4 s.send(’Hello

, world’) # send same data

5 data = s.recv

(1024) # receive the response

6 print data

# print the result

7 s.close()

# close the connection

24 / 62

slide-28
SLIDE 28

Communication: Message-oriented communication Simple transient messaging with sockets

Messaging

Message-oriented middleware Aims at high-level persistent asynchronous communication: Processes send each other messages, which are queued Sender need not wait for immediate reply, but can do other things Middleware often ensures fault tolerance

25 / 62

slide-29
SLIDE 29

Communication: Remote procedure call Basic RPC operation

Basic RPC operation

Observations Application developers are familiar with simple procedure model Well-engineered procedures operate in isolation (black box) There is no fundamental reason not to execute procedures on separate machine Conclusion Communication between caller & callee can be hidden by using procedure-call mechanism.

Call local procedure and return results Call remote procedure Return from call Client Request Reply Server Time Wait for result

26 / 62

slide-30
SLIDE 30

Communication: Remote procedure call Basic RPC operation

Basic RPC operation

Implementation

  • f doit

Client OS Server OS Client machine Server machine Client stub Client process Server process

  • 1. Client call to

procedure

  • 2. Stub builds

message

  • 5. Stub unpacks

message

  • 6. Stub makes

local call to “doit”

  • 3. Message is sent

across the network

  • 4. Server OS

hands message to server stub Server stub

r = a,b doit( ) r = a,b doit( ) proc: “doit” type1: val(a) type2: val(b) proc: “doit” type1: val(a) type2: val(b) proc: “doit” type1: val(a) type2: val(b) 1

Client procedure calls client stub.

2

Stub builds message; calls local OS.

3

OS sends message to remote OS.

4

Remote OS gives message to stub.

5

Stub unpacks parameters; calls server.

6

Server does local call; returns result to stub.

7

Stub builds message; calls OS.

8

OS sends message to client’s OS.

9

Client’s OS gives message to stub.

10 Client stub unpacks result; returns to client. 27 / 62

slide-31
SLIDE 31

Communication: Remote procedure call Parameter passing

RPC: Parameter passing

There’s more than just wrapping parameters into a message Client and server machines may have different data representations (think

  • f byte ordering)

Wrapping a parameter means transforming a value into a sequence of bytes Client and server have to agree on the same encoding: How are basic data values represented (integers, floats, characters) How are complex data values represented (arrays, unions) Conclusion Client and server need to properly interpret messages, transforming them into machine-dependent representations.

28 / 62

slide-32
SLIDE 32

Communication: Remote procedure call Parameter passing

RPC: Parameter passing

Some assumptions Copy in/copy out semantics: while procedure is executed, nothing can be assumed about parameter values. All data that is to be operated on is passed by parameters. Excludes passing references to (global) data.

29 / 62

slide-33
SLIDE 33

Communication: Remote procedure call Parameter passing

RPC: Parameter passing

Some assumptions Copy in/copy out semantics: while procedure is executed, nothing can be assumed about parameter values. All data that is to be operated on is passed by parameters. Excludes passing references to (global) data. Conclusion Full access transparency cannot be realized.

29 / 62

slide-34
SLIDE 34

Communication: Remote procedure call Parameter passing

RPC: Parameter passing

Some assumptions Copy in/copy out semantics: while procedure is executed, nothing can be assumed about parameter values. All data that is to be operated on is passed by parameters. Excludes passing references to (global) data. Conclusion Full access transparency cannot be realized. A remote reference mechanism enhances access transparency Remote reference offers unified access to remote data Remote references can be passed as parameter in RPCs Note: stubs can sometimes be used as such references

29 / 62

slide-35
SLIDE 35

Communication: Remote procedure call Variations on RPC

Asynchronous RPCs

Essence Try to get rid of the strict request-reply behavior, but let the client continue without waiting for an answer from the server.

Call local procedure Call remote procedure Return from call Client Request Accept request Server Time Wait for acceptance Callback to client Return results

Asynchronous RPC 30 / 62

slide-36
SLIDE 36

Communication: Remote procedure call Variations on RPC

Sending out multiple RPCs

Essence Sending an RPC request to a group of servers.

Call local procedure Call local procedure Call remote procedures Client Server Server Time Callbacks to client

Multicast RPC 31 / 62

slide-37
SLIDE 37

Communication: Remote procedure call Example: DCE RPC

RPC in practice

C compiler Uuidgen IDL compiler C compiler C compiler Linker Linker C compiler Server stub

  • bject file

Server

  • bject file

Runtime library Server binary Client binary Runtime library Client stub

  • bject file

Client

  • bject file

Client stub Client code Header Server stub Interface definition file Server code #include #include Writing a Client and a Server 32 / 62

slide-38
SLIDE 38

Communication: Remote procedure call Example: DCE RPC

Client-to-server binding (DCE)

Issues (1) Client must locate server machine, and (2) locate the server.

Port table Server DCE daemon Client

  • 1. Register port
  • 2. Register service
  • 3. Look up server
  • 4. Ask for port
  • 5. Do RPC

Directory server Server machine Client machine Directory machine

Binding a client to a server 33 / 62

slide-39
SLIDE 39

Communication: Remote procedure call Message-oriented persistent communication

Message-oriented middleware

Essence Asynchronous persistent communication through support of middleware-level

  • queues. Queues correspond to buffers at communication servers.

Operations Operation Description put Append a message to a specified queue get Block until the specified queue is nonempty, and remove the first message poll Check a specified queue for messages, and remove the first. Never block notify Install a handler to be called when a message is put into the specified queue

Message-queuing model 34 / 62

slide-40
SLIDE 40

Communication: Remote procedure call Message-oriented persistent communication

General model

Queue managers Queues are managed by queue managers. An application can put messages

  • nly into a local queue. Getting a message is possible by extracting it from a

local queue only ⇒ queue managers need to route messages. Routing

Local OS Source queue manager Logical queue-level address (name) Contact address Destination queue manager Address lookup database Look up contact address

  • f destination

queue manager Local OS Network

General architecture of a message-queuing system 35 / 62

slide-41
SLIDE 41

Communication: Remote procedure call Message-oriented persistent communication

Message broker

Observation Message queuing systems assume a common messaging protocol: all applications agree on message format (i.e., structure and data representation) Broker handles application heterogeneity in an MQ system Transforms incoming messages to target format Very often acts as an application gateway May provide subject-based routing capabilities (i.e., publish-subscribe capabilities)

Message brokers 36 / 62

slide-42
SLIDE 42

Communication: Remote procedure call Message-oriented persistent communication

Message broker: general architecture

Local OS Application Interface Local OS Local OS Application Interface Broker plugins Rules Queuing layer Source Destination Message broker

Message brokers 37 / 62

slide-43
SLIDE 43

Communication: Remote procedure call Advanced transient messaging

Making sockets easier to work with

Observation Sockets are rather low level and programming mistakes are easily made. However, the way that they are used is often the same (such as in a client-server setting). Alternative: ZeroMQ Provides a higher level of expression by pairing sockets: one for sending messages at process P and a corresponding one at process Q for receiving

  • messages. All communication is asynchronous.

Three patterns Request-reply Publish-subscribe Pipeline

Using messaging patterns: ZeroMQ 38 / 62

slide-44
SLIDE 44

Communication: Remote procedure call Advanced transient messaging

Request-reply

Server

1 import zmq 2 context = zmq.Context

()

3 4 p1 = "tcp://"+ HOST +":"+ PORT1 # how and where to connect 5 p2 = "tcp://"+ HOST +":"+ PORT2 # how and where to connect 6 s

= context.socket(zmq.REP) # create reply socket

7 8 s.bind(p1)

# bind socket to address

9 s.bind(p2)

# bind socket to address

10 while True: 11

message = s.recv() # wait for incoming message

12

if not "STOP" in message: # if not to stop ...

13

s.send(message + "*") # append "*" to message

14

else: # else ...

15

break # break out of loop and end

Using messaging patterns: ZeroMQ 39 / 62

slide-45
SLIDE 45

Communication: Remote procedure call Advanced transient messaging

Request-reply

Client

1 import zmq 2 context = zmq.Context

()

3 4 php = "tcp://"+ HOST +":"+ PORT # how and where to connect 5 s

= context.socket(zmq.REQ) # create socket

6 7 s.connect(php)

# block until connected

8 s.send("Hello World")

# send message

9 message = s.recv()

# block until response

10 s.send("STOP")

# tell server to stop

11 print message

# print result

Using messaging patterns: ZeroMQ 40 / 62

slide-46
SLIDE 46

Communication: Remote procedure call Advanced transient messaging

Publish-subscribe

Server

1 import zmq

, time

2 3 context = zmq.Context

()

4 s = context.socket(zmq.PUB)

# create a publisher socket

5 p = "tcp://"+ HOST +":"+ PORT

# how and where to communicate

6 s.bind(p)

# bind socket to the address

7 while True: 8

time.sleep (5) # wait every 5 seconds

9

s.send("TIME " + time.asctime ()) # publish the current time

Client

1 import zmq 2 3 context = zmq.Context

()

4 s = context.socket(zmq.SUB)

# create a subscriber socket

5 p = "tcp://"+ HOST +":"+ PORT

# how and where to communicate

6 s.connect(p)

# connect to the server

7 s.setsockopt(zmq.SUBSCRIBE

, "TIME") # subscribe to TIME messages

8 9 for i in range

(5): # Five iterations

10

time = s.recv() # receive a message

11

print time

Using messaging patterns: ZeroMQ 41 / 62

slide-47
SLIDE 47

Communication: Remote procedure call Advanced transient messaging

Pipeline

Source

1 import zmq

, time , pickle , sys , random

2 3 context = zmq.Context

()

4 me

= str(sys.argv [1])

5 s

= context.socket(zmq.PUSH) # create a push socket

6 src = SRC1

if me == ’1’ else SRC2 # check task source host

7 prt = PORT1 if me == ’1’ else PORT2

# check task source port

8 p

= "tcp://"+ src +":"+ prt # how and where to connect

9 s.bind(p)

# bind socket to address

10 11 for i in range

(100): # generate 100 workloads

12

workload = random.randint(1, 100) # compute workload

13

s.send(pickle.dumps((me,workload ))) # send workload to worker

Using messaging patterns: ZeroMQ 42 / 62

slide-48
SLIDE 48

Communication: Remote procedure call Advanced transient messaging

Pipeline

Worker

1 import zmq

, time , pickle , sys

2 3 context = zmq.Context

()

4 me = str(sys.argv

[1])

5 r

= context.socket(zmq.PULL) # create a pull socket

6 p1 = "tcp://"+ SRC1 +":"+ PORT1

# address first task source

7 p2 = "tcp://"+ SRC2 +":"+ PORT2

# address second task source

8 r.connect(p1)

# connect to task source 1

9 r.connect(p2)

# connect to task source 2

10 11 while True: 12

work = pickle.loads(r.recv ()) # receive work from a source

13

time.sleep(work [1]*0.01) # pretend to work

Using messaging patterns: ZeroMQ 43 / 62

slide-49
SLIDE 49

Communication: Remote procedure call Advanced transient messaging

Example: RabbitMQ

Objective RabbitMQ is a message broker. It accepts and forwards messages. Persistent, asynchronous communication Sender

Create a communication channel and declare a message queue Publish data to the queue

Receiver

Create a communication channel and declare a message queue Define a callback function to handle incoming information Start consuming data from the channel

Using messaging patterns: ZeroMQ 44 / 62

slide-50
SLIDE 50

Communication: Remote procedure call Advanced transient messaging

Example: RabbitMQ

Features Round-robin dispatching Durable (persistent) messages Publish/Subscribe (fanout) Topic-based exchange (filtering) RPC Interface Multiple language bindings

Using messaging patterns: ZeroMQ 45 / 62

slide-51
SLIDE 51

Communication: Remote procedure call Advanced transient messaging

MPI: When lots of flexibility is needed

Representative operations Operation Description MPI bsend Append outgoing message to a local send buffer MPI send Send a message and wait until copied to local or remote buffer MPI ssend Send a message and wait until transmission starts MPI sendrecv Send a message and wait for reply MPI isend Pass reference to outgoing message, and continue MPI issend Pass reference to outgoing message, and wait until receipt starts MPI recv Receive a message; block if there is none MPI irecv Check if there is an incoming message, but do not block

The Message-Passing Interface (MPI) 46 / 62

slide-52
SLIDE 52

Communication: Remote procedure call Example: IBM’s WebSphere message-queuing system

IBM’s WebSphere MQ

Basic concepts Application-specific messages are put into, and removed from queues Queues reside under the regime of a queue manager Processes can put messages only in local queues, or through an RPC mechanism Message transfer Messages are transferred between queues Message transfer between queues at different processes, requires a channel At each end point of channel is a message channel agent Message channel agents are responsible for: Setting up channels using lower-level network communication facilities (e.g., TCP/IP) (Un)wrapping messages from/in transport-level packets Sending/receiving packets

Overview 47 / 62

slide-53
SLIDE 53

Communication: Remote procedure call Example: IBM’s WebSphere message-queuing system

IBM’s WebSphere MQ

Schematic overview

MCA MCA MQ Interface Stub Queue manager Server stub Send queue Routing table Enterprise network RPC (synchronous) Local network Message passing (asynchronous) To other remote queue managers Client's receive queue Sending client Receiving client MCA MCA MQ Interface Stub Queue manager Server stub

Channels are inherently unidirectional Automatically start MCAs when messages arrive Any network of queue managers can be created Routes are set up manually (system administration)

Overview 48 / 62

slide-54
SLIDE 54

Communication: Remote procedure call Example: IBM’s WebSphere message-queuing system

Message channel agents

Some attributes associated with message channel agents Attribute Description Transport type Determines the transport protocol to be used FIFO delivery Indicates that messages are to be delivered in the

  • rder they are sent

Message length Maximum length of a single message Setup retry count Specifies maximum number of retries to start up the remote MCA Delivery retries Maximum times MCA will try to put received message into queue

Channels 49 / 62

slide-55
SLIDE 55

Communication: Remote procedure call Example: IBM’s WebSphere message-queuing system

IBM’s WebSphere MQ

Routing By using logical names, in combination with name resolution to local queues, it is possible to put a message in a remote queue

SQ1 SQ1 SQ1 SQ1 SQ2 SQ1 SQ1 SQ1 SQ1 SQ2 SQ1 SQ1 QMA QMB QMA QMA QMC QMC QMC QMC QMB QMD QMD QMD Routing table Routing table Routing table Routing table LA1 LA1 LA1 LA2 LA2 LA2 QMA QMC QMA QMC QMD QMD Alias table Alias table Alias table

QMD QMA QMB QMB

SQ1 SQ1 SQ1 SQ1 SQ2 SQ2

Message transfer 50 / 62

slide-56
SLIDE 56

Communication: Multicast communication Application-level tree-based multicasting

Application-level multicasting

Essence Organize nodes of a distributed system into an overlay network and use that network to disseminate data: Oftentimes a tree, leading to unique paths Alternatively, also mesh networks, requiring a form of routing

51 / 62

slide-57
SLIDE 57

Communication: Multicast communication Application-level tree-based multicasting

Application-level multicasting in Chord

Basic approach

1

Initiator generates a multicast identifier mid.

2

Lookup succ(mid), the node responsible for mid.

3

Request is routed to succ(mid), which will become the root.

4

If P wants to join, it sends a join request to the root.

5

When request arrives at Q: Q has not seen a join request before ⇒ it becomes forwarder; P becomes child of Q. Join request continues to be forwarded. Q knows about tree ⇒ P becomes child of Q. No need to forward join request anymore.

52 / 62

slide-58
SLIDE 58

Communication: Multicast communication Application-level tree-based multicasting

ALM: Some costs

Different metrics

Ra Rb Rc Re A B D C Internet Router End host Overlay network

7 5 1 1 1 1 1 30 20 40

E Rd

Link stress: How often does an ALM message cross the same physical link? Example: message from A to D needs to cross Ra,Rb twice. Stretch: Ratio in delay between ALM-level path and network-level path. Example: messages B to C follow path of length 73 at ALM, but 47 at network level ⇒ stretch = 73/47.

Performance issues in overlays 53 / 62

slide-59
SLIDE 59

Communication: Multicast communication Flooding-based multicasting

Flooding

Essence P simply sends a message m to each of its neighbors. Each neighbor will forward that message, except to P, and only if it had not seen m before. Performance The more edges, the more expensive! The size of a random overlay as function of the number of nodes

50 100 150 200 250 300 pedge = 0.6 pedge = 0.4 pedge = 0.2 100 500 1000 Number of nodes Number of edges (x 1000)

54 / 62

slide-60
SLIDE 60

Communication: Multicast communication Flooding-based multicasting

Flooding

Essence P simply sends a message m to each of its neighbors. Each neighbor will forward that message, except to P, and only if it had not seen m before. Performance The more edges, the more expensive! The size of a random overlay as function of the number of nodes

50 100 150 200 250 300 pedge = 0.6 pedge = 0.4 pedge = 0.2 100 500 1000 Number of nodes Number of edges (x 1000)

Variation Let Q forward a message with a certain probability pflood, possibly even dependent on its own number of neighbors (i.e., node degree) or the degree of its neighbors.

54 / 62

slide-61
SLIDE 61

Communication: Multicast communication Gossip-based data dissemination

Epidemic protocols

Assume there are no write–write conflicts Update operations are performed at a single server A replica passes updated state to only a few neighbors Update propagation is lazy, i.e., not immediate Eventually, each update should reach every replica Two forms of epidemics Anti-entropy: Each replica regularly chooses another replica at random, and exchanges state differences, leading to identical states at both afterwards Rumor spreading: A replica which has just been updated (i.e., has been contaminated), tells a number of other replicas about its update (contaminating them as well).

55 / 62

slide-62
SLIDE 62

Communication: Multicast communication Gossip-based data dissemination

Anti-entropy

Principle operations A node P selects another node Q from the system at random. Pull: P only pulls in new updates from Q Push: P only pushes its own updates to Q Push-pull: P and Q send updates to each other Observation For push-pull it takes O(log(N)) rounds to disseminate updates to all N nodes (round = when every node has taken the initiative to start an exchange).

Information dissemination models 56 / 62

slide-63
SLIDE 63

Communication: Multicast communication Gossip-based data dissemination

Anti-entropy: analysis

Basics Consider a single source, propagating its update. Let pi be the probability that a node has not received the update after the ith round. Analysis: staying ignorant With pull, pi+1 = (pi)2: the node was not updated during the ith round and should contact another ignorant node during the next round. With push, pi+1 = pi(1− 1

N )N(1−pi) ≈ pie−1 (for

small pi and large N): the node was ignorant during the ith round and no updated node chooses to contact it during the next round. With push-pull: (pi)2 ·(pie−1)

push pull push-pull 25 5 10 15 20 1.0 0.8 0.6 0.4 0.2 Probability not yet updated Round N = 10,000

Information dissemination models 57 / 62

slide-64
SLIDE 64

Communication: Multicast communication Gossip-based data dissemination

Anti-entropy performance

push pull push-pull 25 5 10 15 20 1.0 0.8 0.6 0.4 0.2 Probability not yet updated Round N = 10,000

Information dissemination models 58 / 62

slide-65
SLIDE 65

Communication: Multicast communication Gossip-based data dissemination

Rumor spreading

Basic model A server S having an update to report, contacts other servers. If a server is contacted to which the update has already propagated, S stops contacting

  • ther servers with probability pstop.

Observation If s is the fraction of ignorant servers (i.e., which are unaware of the update), it can be shown that with many servers s = e−(1/pstop+1)(1−s)

Information dissemination models 59 / 62

slide-66
SLIDE 66

Communication: Multicast communication Gossip-based data dissemination

Rumor spreading

The effect of stopping

0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.20 0.10 0.05 0.15 pstop s

Consider 10,000 nodes 1/pstop s Ns 1 0.203188 2032 2 0.059520 595 3 0.019827 198 4 0.006977 70 5 0.002516 25 6 0.000918 9 7 0.000336 3

Information dissemination models 60 / 62

slide-67
SLIDE 67

Communication: Multicast communication Gossip-based data dissemination

Rumor spreading

The effect of stopping

0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.20 0.10 0.05 0.15 pstop s

Consider 10,000 nodes 1/pstop s Ns 1 0.203188 2032 2 0.059520 595 3 0.019827 198 4 0.006977 70 5 0.002516 25 6 0.000918 9 7 0.000336 3

Note If we really have to ensure that all servers are eventually updated, rumor spreading alone is not enough

Information dissemination models 60 / 62

slide-68
SLIDE 68

Communication: Multicast communication Gossip-based data dissemination

Deleting values

Fundamental problem We cannot remove an old value from a server and expect the removal to

  • propagate. Instead, mere removal will be undone in due time using epidemic

algorithms Solution Removal has to be registered as a special update by inserting a death certificate

Removing data 61 / 62

slide-69
SLIDE 69

Communication: Multicast communication Gossip-based data dissemination

Deleting values

When to remove a death certificate (it is not allowed to stay for ever) Run a global algorithm to detect whether the removal is known everywhere, and then collect the death certificates (looks like garbage collection) Assume death certificates propagate in finite time, and associate a maximum lifetime for a certificate (can be done at risk of not reaching all servers) Note It is necessary that a removal actually reaches all servers.

Removing data 62 / 62