Comunicazione nei Sistemi Distribuiti Parte 2 Corso di Sistemi - - PDF document

comunicazione nei sistemi distribuiti
SMART_READER_LITE
LIVE PREVIEW

Comunicazione nei Sistemi Distribuiti Parte 2 Corso di Sistemi - - PDF document

Macroarea di Ingegneria Dipartimento di Ingegneria Civile e Ingegneria Informatica Comunicazione nei Sistemi Distribuiti Parte 2 Corso di Sistemi Distribuiti e Cloud Computing A.A. 2019/20 Valeria Cardellini Laurea Magistrale in Ingegneria


slide-1
SLIDE 1

Corso di Sistemi Distribuiti e Cloud Computing A.A. 2019/20 Valeria Cardellini Laurea Magistrale in Ingegneria Informatica

Comunicazione nei Sistemi Distribuiti

Parte 2

Macroarea di Ingegneria Dipartimento di Ingegneria Civile e Ingegneria Informatica

Comunicazione orientata ai messaggi

  • RPC e RMI migliorano la trasparenza della

distribuzione

  • Ma sono sincroni

– Sincronia nel tempo: attesa per risposta – Sincronia nello spazio: i dati condivisi sono noti – Funzionalità e comunicazione sono accoppiate

  • Quali modelli di comunicazione per un miglior

disaccoppiamento?

  • Comunicazione orientata ai messaggi

⎼ Funzionalità e comunicazione sono disaccoppiate – Di tipo transiente

  • Berkeley socket: già esaminata in altri corsi
  • Message Passing Interface (MPI)

– Di tipo persistente

  • Message Oriented Middleware (MOM)

Valeria Cardellini – SDCC 2019/20 1

slide-2
SLIDE 2

Message Passing Interface (MPI)

  • Libreria per lo scambio di messaggi tra processi in

esecuzione su nodi diversi

– Specifica della sola interfaccia (http://www.mpi-forum.org/) – Diverse implementazioni, tra cui Open MPI (http://www.open-mpi.org/) e MPICH (http://www.mcs.anl.gov/research/projects/mpich2/) – Standard de facto per la comunicazione tra i nodi di un sistema che esegue un programma parallelo sviluppato per un’architettura a memoria distribuita

  • MPI definisce una serie di primitive per la

comunicazione tra processi; in particolare:

– Primitive per la comunicazione punto-punto: per l’invio e la ricezione di un messaggio tra due processi diversi – Primitive per la comunicazione collettiva

Valeria Cardellini – SDCC 2019/20 2

Comunicazione punto-punto in MPI

  • Principali primitive per la comunicazione punto-punto:

– MPI_Send e MPI_Recv: comunicazione bloccante

  • MPI_Send con modalità sincrona o bufferizzata a seconda

dell’implementazione

– MPI_Bsend: invio bloccante bufferizzato – MPI_Ssend: invio sincrono bloccante – MPI_Isend e MPI_Irecv: comunicazione non bloccante

Primitive MPI Significato MPI_Bsend Aggiunge il messaggio in uscita ad un buffer per l’invio MPI_Send Invia il messaggio e aspetta finché non viene copiato in un buffer locale o remoto MPI_Ssend Invia il messaggio e aspetta finché non inizia la ricezione MPI_Isend Invia il riferimento al messaggio in uscita e continua MPI_Recv Riceve il messaggio; si blocca se non ce ne sono

Valeria Cardellini – SDCC 2019/20 3

slide-3
SLIDE 3

Esempio di comunicazione in MPI

#include <stdio.h> #include <string.h> #include <mpi.h> int main (int argc, char **argv) { int myrank; char message[20]; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); printf("Il mio rank e' : %d\n", myrank); if (myrank == 0) { //Invia un messaggio al processo 1 strcpy(message, "PROVA"); MPI_Send(message, strlen(message)+1, MPI_CHAR, 1, 99, MPI_COMM_WORLD); printf("%d) Ho inviato: '%s'\n", myrank, message); } else if (myrank==1) { //Riceve il messaggio dal processo 0 MPI_Recv(message, 20, MPI_CHAR, 0, 99, MPI_COMM_WORLD, &status); printf("%d) Ho ricevuto: '%s'\n", myrank, message); } MPI_Finalize(); return 0; }

MPI_Send(buf, count, datatype, dest, tag, comm) MPI_Recv(buf, count, datatype, source, tag, comm, status)

Valeria Cardellini – SDCC 2019/20 4

Message-oriented middleware

  • Communication middleware that supports sending

and receiving messages in a persistent way

  • Loose coupling among system/application

components

– Decoupling in time and space – Can also support synchronization decoupling

  • Two patterns:

– Message queue – Publish-subscribe (pub/sub)

  • And two related types of systems:

– Message queue system (MQS) – Pub/sub system

Valeria Cardellini – SDCC 2019/20 5

slide-4
SLIDE 4

Queue message pattern

  • Messages are put into queue
  • Multiple consumers can read from the queue
  • Each message is delivered to only one consumer
  • Principles

– Loose coupling – Service statelessness

  • Services minimize resource consumption by deferring the

management of state information when necessary

  • Apps:

– Task scheduling, load balancing, collaboration

Valeria Cardellini – SDCC 2019/20 6

Queue message pattern

Valeria Cardellini – SDCC 2019/20 7

A sends a message to B B issues a response message back to A

slide-5
SLIDE 5

Message queue API

  • Basic interface to a queue in a MQS:

– put: nonblocking send

  • Append a message to a specified queue

– get: blocking receive

  • Block until the specified queue is nonempty and remove the

first message

  • Variations: allow searching for a specific message in the

queue, e.g., using a matching pattern

– poll: nonblocking receive

  • Check a specified queue for message and remove the first
  • Never block

– notify: nonblocking receive

  • Install a handler (callback function) to be automatically

called when a message is put into the specified queue

8 Valeria Cardellini – SDCC 2019/20

Publish/subscribe pattern

Valeria Cardellini – SDCC 2019/20 9

  • Application components can publish asynchronous

messages (e.g., event notifications), and/or declare their interest in message topics by issuing a subscription

slide-6
SLIDE 6

Publish/subscribe pattern

Valeria Cardellini – SDCC 2019/20 10

  • Multiple consumers can subscribe to topic with or

without filters

  • Subscriptions are collected by an event dispatcher

component, responsible for routing events to all matching subscribers

– For scalability reasons, its implementation can be distributed

  • High degree of decoupling among components

– Easy to add and remove components – Appropriate for dynamic environments

Publish/subscribe pattern

  • A sibling of message queue pattern but further

generalizes it by delivering a message to multiple consumers

– Message queue: delivers messages to only one receiver, i.e., one-to-one communication – Pub/sub channel: delivers messages to multiple receivers, i.e., one-to-many communication

11 Valeria Cardellini – SDCC 2019/20

slide-7
SLIDE 7

Publish/subscribe API

  • Calls that capture the core of any pub/sub system:

– publish(event): to publish an event

  • Events can be of any data type supported by the given

implementation languages and may also contain meta-data

– subscribe(filter expr, notify_cb, expiry) → sub handle: to subscribe to an event

  • Takes a filter expression, a reference to a notify callback for

event delivery, and an expiry time for the subscription registration.

  • Returns a subscription handle

– unsubscribe(sub handle) – notify_cb(sub_handle, event): called by the pub/sub system to deliver a matching event

12 Valeria Cardellini – SDCC 2019/20

MOM functionalities

  • MOM handles the complexity of addressing,

routing, availability of communicating application components (or applications), and message format transformations

Source: “Cloud Computing Patterns”, http://bit.ly/2hZv6Xs

Valeria Cardellini – SDCC 2019/20 13

slide-8
SLIDE 8

MOM functionalities

  • Let us analyze

– Semantics delivery – Message routing – Message transformations

Valeria Cardellini – SDCC 2019/20 14

Semantics delivery in MOM

At-least-once delivery How can MOM ensure that messages are received successfully?

– By sending ack for each retrieved message and resending message if message is not received – Be careful: app should be tolerant to message duplications

Valeria Cardellini – SDCC 2019/20 15

slide-9
SLIDE 9

Semantics delivery in MOM

Exactly-once delivery How can MOM ensure that a message is delivered only exactly once to a receiver?

– By filtering possible message duplicates automatically – Upon creation, each message is associated with a unique message ID, which is used to filter message duplicates during their traversal from sender to receiver – Messages must also survive MOM components’ failures

Valeria Cardellini – SDCC 2019/20 16

Semantics delivery in MOM

Transaction-based delivery How can MOM ensure that messages are only deleted from a message queue if they have been received successfully?

– MOM and the receiver participate in a transaction: all

  • perations involved in the reception of a message are

performed under one transactional context guaranteeing ACID behavior

Valeria Cardellini – SDCC 2019/20 17

slide-10
SLIDE 10

Semantics delivery in MOM

Timeout-based delivery How can MOM ensure that messages are only deleted from a message queue if they have been received successfully at least once?

– Messages are not deleted immediately from the queue, but marked as being invisible until a visibility timeout expires – Invisible message cannot be read by another receiver – After receiver ack of message receipt, the message is deleted from the queue

Valeria Cardellini – SDCC 2019/20 18

Message routing: general model

  • Queues are managed by queue managers (QMs)

– An application can put messages only into a local queue – Getting a message is possible by extracting it from a local queue only

  • QMs need to route messages

– Function as message-queuing “relays” that interact with distributed applications and each other – Support the idea of an overlay network – Also special queue managers that operate as routers

Valeria Cardellini – SDCC 2019/20 19

slide-11
SLIDE 11

Message routing: overlay network

  • Overlay network is used to route messages

– By using routing tables – Routing tables stored and managed by QMs

Valeria Cardellini – SDCC 2019/20 20

  • The overlay network needs

to be maintained over time

– Routing tables are usually set up and managed manually – Dynamic overlay networks require to dynamically manage the mapping between queue names and their location

Message transformation: message broker

  • New/existing apps that need to be integrated into a

single, coherent system rarely agree on a common data format

  • How to handle data heterogeneity?

⎼ We have already examined different solutions in the context

  • f RPC
  • Let’s focus on message broker

– Message broker: component that usually takes care of application heterogeneity in a MOM

Valeria Cardellini – SDCC 2019/20 21

slide-12
SLIDE 12

Message broker: general architecture

  • Message broker handles application heterogeneity

– Converts incoming messages to target format providing access transparency – Very often acts as an application gateway – Manages a repository of conversion rules and programs to transform a message of one type to another – May provide subject-based routing capabilities – To be scalable and reliable can be implemented in a distributed way

Valeria Cardellini – SDCC 2019/20 22

MOM frameworks

  • Examples of MOM systems and libraries

– Apache ActiveMQ http://activemq.apache.org – Apache Kafka – Apache Pulsar https://pulsar.apache.org – IBM MQ – Java Message Service (JMS): API MOM for Java – Microsoft Message Queueing (MSMQ) – NATS https://nats.io – Open MQ – RabbitMQ https://www.rabbitmq.com – ZeroMQ https://zeromq.org

  • Clear distinction between queue message and pub/sub

patterns often lacks

– Some frameworks support both (e.g., Kafka, NATS) – Others not (e.g., Redis is pub/sub https://redis.io/topics/pubsub)

Valeria Cardellini – SDCC 2019/20 23

slide-13
SLIDE 13

MOM frameworks

  • Also Cloud-based products

– Amazon Simple Queue Service (SQS) – CloudAMQP: RabbitMQ as a Service – Google Cloud Pub/Sub – Microsoft Azure Service Bus queues

Valeria Cardellini – SDCC 2019/20 24

Using message queues: use cases

1. Store and forward messages which are sent by a producer and received by a consumer (message queue pattern) 2. Distribute tasks among multiple workers (competing consumers pattern) 3. Deliver messages to many consumers at once (pub/sub pattern) 4. Receive messages selectively: producer send messages to an exchange, that selects the queue 5. Run a function on a remote node and wait for the result (request /reply pattern)

Valeria Cardellini – SDCC 2019/20 25

Source: RabbitMQ tutorial http://bit.ly/2zPPMJO

slide-14
SLIDE 14

IBM MQ

  • The first enterprise messaging technology, from 1993
  • Basic concepts:

– Application-specific messages are put into and removed from queues – Queues reside under the regime of a queue manager (QM) – Processes can put messages only in local queues, or in remote queues through an RPC mechanism

  • Message transfer

– Messages are transferred between queues – Message transfer between queues requires a channel

  • At each endpoint of channel is a

message channel agent (MCA)

Valeria Cardellini – SDCC 2019/20 26

https://www.ibm.com/products/mq

  • MCAs are responsible for:

– Setting up channels – Sending/receiving messages – Encrypting messages

IBM MQ

  • Principles of operation:

– Channels are inherently unidirectional – Automatically start MCAs when messages arrive – Any network of queue managers can be created – Routes are set up manually (system administration) – Routing: by using logical names, in combination with name resolution to local queues, it is possible to route message to remote queue

Valeria Cardellini – SDCC 2019/20 27

slide-15
SLIDE 15

Amazon Simple Queue Service (SQS)

  • Cloud-based message queue service based on polling

model

– Goal: decouple Cloud app components – Message queues fully managed by AWS – Messages are stored in queues for a limited period of time

  • Application components using SQS can run

independently and asynchronously and be developed with different technologies

  • Provides timeout-based delivery

– Messages are only deleted from a message queue if they have been received properly – A received message is locked during processing (visibility timeout); if processing fails, the lock expires and the message is available again

  • Can be combined with Amazon SNS

– To push a message to multiple SQS queues in parallel

Valeria Cardellini – SDCC 2019/20 28

Amazon SQS: API

  • CreateQueue, ListQueues, DeleteQueue

– Create, list, delete queues

  • SendMessage, ReceiveMessage

– Add/receive messages to/from a specified queue (message size up to 256 KB) – Larger message? Put in queue reference to message payload stored in S3

  • DeleteMessage

– Remove a received message from a specified queue (the component must delete the message after receiving and processing it)

Valeria Cardellini – SDCC 2019/20 29

slide-16
SLIDE 16

Amazon SQS: API

  • ChangeMessageVisibility

– Change the visibility timeout of a specified message in a queue (when received, the message remains in the queue upon it is deleted explicitly by the receiver)

  • SetQueueAttributes, GetQueueAttributes

– Control queue settings, get information about a queue

Valeria Cardellini – SDCC 2019/20 30

Amazon SQS: example

Valeria Cardellini – SDCC 2019/20

  • Cloud app for online photo processing service
  • Let’s use SQS to achieve app components decoupling,

load balancing, fault tolerance

http://bit.ly/2gwJFBw

31

slide-17
SLIDE 17

Apache Kafka

  • General-purpose, distributed pub/sub system
  • Originally developed in 2010 by LinkedIn
  • One of the most popular Apache projects
  • Written in Scala
  • Horizontally scalable
  • Fault-tolerant
  • High throughput ingestion

– Billions of messages

Kreps et al., “Kafka: A Distributed Messaging System for Log Processing”, NetDB’11.

Valeria Cardellini – SDCC 2019/20 32

  • Not only messaging, also

processing of data

⎼ We focus on messaging https://kafka.apache.org/documentation/

Kafka at a glance

  • Kafka maintains feeds of messages in categories called

topics

– A topic can have 0, 1, or many consumers subscribing to data written to it

  • Producers: publish messages to a Kafka topic
  • Consumers: subscribe to topics and process the feed of

published message

  • Kafka cluster: distributed log of data over servers known

as brokers

– A broker is responsible for receiving and storing published data

Valeria Cardellini – SDCC 2019/20 33

slide-18
SLIDE 18

Kafka: topics and partitions

  • Topic: a category to which the message is published
  • For each topic, Kafka cluster maintains a partitioned log

– Log (data structure!): append-only, totally-ordered sequence of records ordered by time

  • A topic is split into a pre-defined number of partitions

– Partition: unit of parallelism of the topic (allows for parallel access)

  • Each partition is replicated with some replication factor

$> bin/kafka-topics.sh

  • -create --bootstrap-server localhost:9092
  • -replication-factor 1 --partitions 1 --topic test

34

  • Create a topic with 1 partition and 1 replica using CLI tools

Valeria Cardellini – SDCC 2019/20

Kafka: partitions

  • Producers publish their records to partitions of a topic

(round-robin or partitioned by keys), and consumers consume the published records of that topic

  • Each partition is an ordered, numbered, immutable

sequence of records that is continually appended to

– Like a commit log

  • Each record is associated with a monotonically

increasing sequence number, called offset

35 Valeria Cardellini – SDCC 2019/20

slide-19
SLIDE 19

Kafka: partitions

  • Partitions are distributed across brokers for scalability
  • Each partition is replicated for fault tolerance across

a configurable number of brokers

  • Each partition has one leader broker and 0 or more

followers

  • The leader handles read and write requests

– Read from leader – Write to leader

  • A follower replicates the leader and acts as a backup
  • Each broker is a leader for some of it partitions and a

follower for others to load balance

– Brokers rely on Apache Zookeeper for coordination

36 Valeria Cardellini – SDCC 2019/20

Kafka: partitions

Valeria Cardellini – SDCC 2019/20 37

slide-20
SLIDE 20

Kafka: producers

  • Producers = data sources
  • Publish data to topics of their choice

– The producer sends data directly to the broker that is the leader for the partition without any routing tier

  • Also responsible for choosing which record to assign

to which partition within the topic

– Random or key-based partitioned – E.g., if user id is the key, all data for a given user will be sent to the same partition

  • Send some message using CLI tools

$> bin/kafka-console-producer.sh --broker-list localhost:9092

  • -topic

test This is a message This is another message

Valeria Cardellini – SDCC 2019/20 38

Kafka: design choice for consumers

  • Push vs. pull model for consumers
  • Push model

– The brokers actively push messages to the consumers – Challenging for the broker to deal with different types of consumers as it controls the rate at which data is transferred – Need to decide whether to send a message immediately or accumulate more data and send

  • Pull model

– The consumer assumes the primary responsibility for retrieving messages from the brokers – The consumer has to maintain an offset that identifies the next message to be transmitted and processed – Pros: better scalability and flexibility (different consumers with diverse needs and capabilities) – Cons: in case broker has no data, consumers may end up busy waiting for data to arrive

39 Valeria Cardellini – SDCC 2019/20

slide-21
SLIDE 21

Kafka: consumers

  • In Kafka design, pull approach rather than push

approach for consumers

– Why? Less burden on brokers

https://kafka.apache.org/documentation/#design_pull

  • Consumer Group: set of consumers sharing a common

group ID

– A Consumer Group maps to a logical subscriber – Each group consists of multiple consumers for scalability and fault tolerance

40 Valeria Cardellini – SDCC 2019/20

Kafka: consumers

  • In Kafka design, pull approach rather than push

approach for consumers

– Why? Less burden on brokers

https://kafka.apache.org/documentation/#design_pull

  • Consumer Group: set of consumers sharing a common

group ID

– A Consumer Group maps to a logical subscriber – Each group consists of multiple consumers for scalability and fault tolerance

41 Valeria Cardellini – SDCC 2019/20

slide-22
SLIDE 22

Kafka: consumers

  • Consumers use the offset to track which messages have

been consumed

– Messages can be replayed using the offset

  • Run the consumer using CLI tools

$> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning

42 Valeria Cardellini – SDCC 2019/20

Kafka: ordering guarantees

  • Messages sent by a producer to a particular topic

partition will be appended in the order they are sent

  • Consumer sees records in the order they are stored

in the partition

  • Strong guarantees about ordering only within a

partition

– Total order over messages within a partition – But Kafka cannot preserve message order between different partitions in a topic

  • However such per-partition ordering combined with

the ability to partition data by key among topic partitions is sufficient for most applications

Valeria Cardellini – SDCC 2019/20 43

slide-23
SLIDE 23

Kafka: delivery semantics

  • Delivery guarantees supported by Kafka

– At-least-once (default): guarantees no message loss but duplicated messages, possibly out-of-order – Exactly-once: guarantees no loss and no duplicates, but requires expensive end-to-end 2PC

  • But not fully exactly-once
  • Support depends on the destination system
  • The user can also implement at-most-once delivery

by disabling retries on the producer and committing

  • ffsets in the consumer prior to processing a

message

See https://kafka.apache.org/documentation/#semantics

Valeria Cardellini – SDCC 2019/20 44

Kafka: fault tolerance

  • Kafka replicates partitions for fault tolerance
  • Kafka makes a message available for consumption
  • nly after all the followers acknowledge to the

leader a successful write

– Implies that a message may not be immediately available for consumption (the usual tradeoff between consistency and availability) – This default behavior can be relaxed is such strong guarantee is not required

  • Kafka retains messages for a configured period of

time

– Messages can be “replayed” in case a consumer fails

Valeria Cardellini – SDCC 2019/20 45

slide-24
SLIDE 24

Kafka and ZooKeeper

  • Zookeeper: hierarchical, distributed key-value store

– Widely used coordination and synchronization service for large distributed systems – Often used for leader election (we’ll study Paxos as consensus algorithm) – Used within many open-source distributed systems besides Kafka (Apache Mesos, Storm, …)

  • Kafka uses ZooKeeper to coordinate between

producers, consumers and brokers

Valeria Cardellini – SDCC 2019/20 46

  • ZooKeeper stores Kafka

metadata

  • List of brokers
  • List of consumers and

their offsets

  • List of producers

Kafka: APIs

  • Four core APIs
  • Producer API: allows apps to

publish records to topics

  • Consumer API: allows apps to

read records from topics

  • Connect API: allows

implementing reusable connectors (producers or consumers) that connect Kafka topics to existing applications

  • r data systems so to move

Valeria Cardellini – SDCC 2019/20

large collections of data into and out of Kafka

⎼ Many connectors already available, e.g., for AWS S3, ActiveMQ, IBM MQ, RabbitMQ, MySQL, Postgres, AWS Lambda

47

https://kafka.apache.org/documentation/#api

slide-25
SLIDE 25

Kafka: APIs

  • Streams API: allows transforming streams of data

from input topics to output topics

– Kafka is not only a pub/sub system but also a real-time streaming platform

  • Use Kafka Streams to process data in pipelines consisting of

multiple stages

  • Kafka APIs support Java and Scala only

Valeria Cardellini – SDCC 2019/20 48

Kafka: client library

  • JVM internal client
  • Plus rich ecosystem of clients, among which:

– Sarama: Go client library

https://shopify.github.io/sarama/

– Python client library

https://pypi.org/project/kafka-python/ https://github.com/confluentinc/confluent-kafka-python/

  • NodeJS client library

https://github.com/Blizzard/node-rdkafka

Valeria Cardellini – SDCC 2019/20 49

slide-26
SLIDE 26

Kafka: consumer in Python

Valeria Cardellini – SDCC 2019/20 50

https://pypi.org/project/kafka-python/

Kafka: producer in Python

Valeria Cardellini – SDCC 2019/20 51

slide-27
SLIDE 27

Kafka as component of a monitoring framework

  • Monasca: monitoring-as-a-service solution integrated

with OpenStack

– OpenStack: set of software tools for building and managing Cloud platforms for public and private clouds

  • Monasca uses Kafka to publish and consume

monitoring metrics and events

Valeria Cardellini – SDCC 2019/20 52

Protocols for MOM

  • Not only systems but also open standard protocols

for message queues

– AMQP (Advanced Message Queueing Protocol)

  • https://www.amqp.org
  • Binary protocol

– MQTT (Message Queue Telemetry Transport)

  • http://mqtt.org
  • Binary protocol

– STOMP (Simple (or Streaming) Text Oriented Messaging Protocol)

  • http://stomp.github.io
  • Text-based protocol
  • Goals:

– Platform- and vendor-agnostic – Provide interoperability between different MOMs

Valeria Cardellini – SDCC 2019/20 53

slide-28
SLIDE 28

Messaging protocols and IoT

  • Often used in Internet of Things (IoT) projects

– Use a message queueing protocol to send data from sensors to services that will process those data – Exploit all the MOM advantages seen so far:

  • Decoupling
  • Resiliency: MOM provides a temporary message storage
  • Traffic spikes handling: data will be persisted in MOM and

processed eventually

Valeria Cardellini – SDCC 2019/20 54

AMQP: characteristics

  • Open-standard protocol for MOM, supported by

industry

– Current version: 1.0 http://docs.oasis-open.org/amqp/core/v1.0/amqp-

core-complete-v1.0.pdf

– Approved in 2014 as ISO and IEC International Standard

  • Binary, application-level protocol

– Based on TCP protocol with additional reliability mechanisms (at-most once, at-least once, exactly once delivery)

  • Programmable protocol

– Several entities and routing schemes are primarily defined by apps

  • Implementations

– Apache ActiveMQ, RabbitMQ, Apache Qpid, Azure Event Hubs, Pika (Python implementation), …

Valeria Cardellini – SDCC 2019/20 55

slide-29
SLIDE 29

AMQP: model

  • The AMQP architecture involves three main actors:

– Publishers, subscribers, and brokers

  • AMQP entities (within the broker): queues, exchanges

and bindings

– Messages are published to exchanges (like post offices or mailboxes) – Exchanges then distribute message copies to queues using rules called bindings – Then AMQP brokers either push messages to consumers subscribed to queues, or consumers pull messages from queues on demand

Valeria Cardellini – SDCC 2019/20 56

https://bit.ly/2oP683F

AMQP: routing

  • Bindings:

– Direct exchange: delivers messages to queues based on the message routing key – Fanout exchange: delivers messages to all

  • f the queues that are

bound to it

Valeria Cardellini – SDCC 2019/20 57

slide-30
SLIDE 30

AMQP: routing

  • Bindings:

– Topic Exchange: delivers messages to one or many queues based on topic matching

  • Often used to implement various publish/subscribe

pattern variations

  • Commonly used for the multicast routing of messages
  • Example use: distributing data relevant to specific

geographic location (e.g., points of sale)

– Headers Exchange: delivers messages based on multiple attributes expressed as headers

  • To route on multiple attributes that are more easily

expressed as message headers than a routing key

Valeria Cardellini – SDCC 2019/20 58

AMQP: messages

  • The AMQP protocol defines two types of messages:

– Bare messages, that are supplied by the sender – Annotated messages, that are seen at the receiver and are added by intermediaries during transit

  • The header conveys the delivery parameters

– Including durability requirements, priority, time to live

Valeria Cardellini – SDCC 2019/20 59

Annotated message

slide-31
SLIDE 31

AMQP and RabbitMQ

  • RabbitMQ https://www.rabbitmq.com

– Implements AMQP and relies on a broker-based architecture – Supports also STOMP and MQTT – See https://www.rabbitmq.com/getstarted.html for implementing with RabbitMQ and different languages the use cases shown in slide 25

Valeria Cardellini – SDCC 2019/20 60

Comunicazione multicast

  • Comunicazione multicast: schema di comunicazione

in cui i dati sono inviati a molteplici destinatari

– Comunicazione broadcast: caso particolare della multicast, in cui i dati sono spediti a tutti i destinatari connessi in rete – Esempi di applicazioni multicast one-to-many: distribuzione di risorse audio/video, distribuzione di file – Esempi di applicazioni multicast many-to-many: servizi di conferenza, giochi multiplayer, simulazioni distribuite interattive

  • La tradizionale comunicazione unicast non scala

Unicast di un video a 1000 utenti Multicast di un video a 1000 utenti

Valeria Cardellini – SDCC 2019/20 61

slide-32
SLIDE 32

Tipologie di multicast

  • Come realizzare il multicast?

– Multicast a livello di rete – Multicast a livello applicativo

Valeria Cardellini – SDCC 2019/20 62

Multicast a livello di rete

  • Replicazione dei pacchetti e routing gestiti dai router
  • Multicast a livello IP (IPMC) basato sui gruppi

– Generalizza UDP con trasmissione uno-a-molti – Gruppo: insieme di host interessati alla stessa applicazione multicast, identificati da uno stesso indirizzo IP

  • Indirizzo IP da 224.0.0.0 a 239.255.255.255 assegnato al

gruppo

– Protocollo IGMP (Internet Group Management Protocol) per il join al gruppo

  • Uso limitato per:

– Mancanza di supporto su larga scala (solo ~5% degli AS) – Problema di tener traccia dell’appartenenza ad un gruppo – Ad es. disabilitato in tutte le piattaforme Cloud a causa del problema del broadcast storm (aumento esponenziale del traffico di rete con possibile saturazione)

Valeria Cardellini – SDCC 2019/20 63

slide-33
SLIDE 33

Multicast applicativo

  • Replicazione dei pacchetti e routing gestiti

dagli end host

  • Idea di base:

– Organizzare i nodi in una overlay network – Usare l’overlay network per diffondere le informazioni

  • Multicast applicativo:

– Strutturato

  • Creazione di percorsi di comunicazione espliciti

nell’overlay network

– Non strutturato

  • Basato su flooding
  • Basato su gossiping

Valeria Cardellini – SDCC 2019/20 64

Multicast applicativo strutturato

  • Come costruire in modo strutturato la rete
  • verlay?

– Albero

  • Unico percorso tra ogni coppia di nodi

– Mesh (rete a maglia)

  • Molti percorsi tra ogni coppia di nodi

Valeria Cardellini – SDCC 2019/20 65

slide-34
SLIDE 34

Multicast applicativo strutturato: albero

  • Esempio: costruzione di un albero di multicast applicativo

in Scribe

– Scribe: sistema pub/sub con architettura decentralizzata e basato sulla DHT Pastry

  • 1. Il nodo che inizia la sessione multicast genera l’identificatore del

gruppo di multicast (mid)

  • 2. Cerca (tramite Pastry) il nodo responsabile per mid
  • 3. Tale nodo diventa la radice dell’albero di multicast
  • 4. Se il nodo P vuole unirsi all’albero di multicast identificato da mid

invia una richiesta di join

  • 5. Quando la richiesta di join arriva al nodo Q
  • Q non ha mai ricevuto una richiesta di join per mid Q diventa

forwarder, P diventa figlio di Q e Q inoltra la richiesta di join verso la radice

  • oppure Q è già un forwarder per mid P diventa figlio di Q; non
  • ccorre inoltrare la richiesta di join alla radice
  • M. Castro et al., “Scribe: A large-scale and decentralised application-

level multicast infrastructure”, IEEE JSAC, 2002.

Valeria Cardellini – SDCC 2019/20 66

Multicast applicativo strutturato: albero

radice join() forwarder forwarder radice join() forwarder forwarder radice join() forwarder forwarder forwarder

Valeria Cardellini – SDCC 2019/20 67

slide-35
SLIDE 35

Metriche di costo del multicast applicativo

  • Link stress: quante volte un messaggio di multicast

applicativo attraversa lo stesso collegamento fisico?

– Esempio: il messaggio da A a D attraversa <Ra,Rb> due volte

  • Stretch: rapporto tra il tempo di trasferimento

nell’overlay network e quello nella rete sottostante

– Esempio: i messaggi da B a C seguono un percorso con costo 71 a livello applicativo, ma 47 a livello di rete stretch=71/47

Valeria Cardellini – SDCC 2019/20 68

Multicast applicativo non strutturato

  • Come realizzare il multicast applicativo

non strutturato?

– Flooding: già esaminato

  • Un nodo P invia il messaggio di multicast a tutti i

suoi vicini

  • A sua volta, ogni vicino (se non ha già visto il

messaggio) lo inoltrerà a tutti i suoi vicini (tranne P)

– Gossiping

Valeria Cardellini – SDCC 2019/20 69

slide-36
SLIDE 36

Protocolli basati su gossiping

  • Protocolli di tipo probabilistico, detti anche di

gossiping o epidemici

– Essendo basati sulla teoria del gossip nelle reti sociali o della diffusione delle epidemie

  • Permettono la rapida diffusione delle informazioni in

reti a larghissima scala attraverso la scelta casuale dei destinatari successivi tra quelli noti al mittente

– Ogni nodo invia il messaggio ad un sottoinsieme, scelto casualmente, di nodi nella rete – Ogni nodo che lo riceve ne rinvierà una copia ad un altro sottoinsieme, anch’esso scelto casualmente, e così via

Valeria Cardellini – SDCC 2019/20 70

Le origini

  • Protocolli di gossiping definiti nel 1987 da Demers et al.

in un lavoro sulla consistenza in database replicati su centinaia di server

  • Idea di base: assumendo che non vi siano conflitti di

scrittura (ovvero aggiornamenti indipendenti)

– Le operazioni di aggiornamento sono eseguite inizialmente su una o alcune repliche – Una replica comunica il suo stato aggiornato ad un numero limitato di vicini – La propagazione dell’aggiornamento è lazy (non immediata) – Al termine, ogni aggiornamento dovrebbe raggiungere tutte le repliche

  • A. Demers et al., “Epidemic Algorithms for Replicated Database Maintenance”,
  • Proc. 6th Symp. on Principles of Distributed Computing, 1987.

Valeria Cardellini – SDCC 2019/20 71

slide-37
SLIDE 37

Why gossiping in large scale DSs?

  • Several attractive properties of gossip-based

information dissemination for large scale distributed systems

– Simplicity of gossiping algorithms – Lack of centralized control and bottlenecks – Scalability: each peer sends only a limited number

  • f messages, independently from the overall size
  • f the system

– Reliability and robustness: thanks to message redundancy

Valeria Cardellini – SDCC 2019/20 72

Where gossiping is used today?

  • Some examples:

– “Amazon uses a gossip protocol to quickly spread information throughout the S3 system. This allows Amazon S3 to quickly route around failed or unreachable servers, among other things.”

http://amzn.to/1MgDVsl

– Amazon’s Dynamo uses a gossip-based failure detection service – The basic information exchange in BitTorrent is based on gossip

Valeria Cardellini – SDCC 2019/20 73

slide-38
SLIDE 38

Modelli di propagazione

  • Consideriamo due modelli di propagazione

– Gossiping puro e anti-entropia

  • Gossiping puro (rumor spreading): un peer che è

stato appena aggiornato (infettato) contatta un altro peer scelto casualmente inviandogli il proprio aggiornamento (infettandolo a sua volta)

  • Anti-entropia: periodicamente ciascun peer sceglie

casualmente un altro peer ed i due peer si scambiano gli aggiornamenti, giungendo al termine ad uno stato simile su entrambi

Valeria Cardellini – SDCC 2019/20 74

Gossiping puro

  • Un peer P che è stato appena aggiornato, contatta un

peer Q scelto a caso

  • Se Q ha già ricevuto l’aggiornamento (è già infetto), P

perde interesse a diffondere il gossip e con probabilità pari a 1/k smette di contattare altri peer

  • Se s è la frazione di peer non ancora aggiornati, si

dimostra che s = e−(k+1)(1−s)

  • Per garantire che un ampio numero di peer sia

aggiornato, occorre combinare il gossiping puro con l’anti-entropia

Al crescere di k aumenta la probabilità che l’aggiornamento si diffonda

Valeria Cardellini – SDCC 2019/20 75

slide-39
SLIDE 39

Anti-entropia

  • Obiettivo: aumentare la similarità tra peer,

aumentando così “l’ordine” (il motivo del nome!)

  • Un peer P sceglie casualmente un altro peer Q nel

sistema; come lo aggiorna?

  • Tre strategie di aggiornamento:

– push: P invia soltanto i suoi aggiornamenti a Q – pull: P prende soltanto gli aggiornamenti da Q – push-pull: P e Q si scambiano reciprocamente gli aggiornamenti (dopodiché possiedono le stesse informazioni)

scelta dati

Valeria Cardellini – SDCC 2019/20 76

scelta dati scelta dati

Anti-entropia: prestazioni

  • Push-pull

– E’ la strategia più veloce – Impiega O(log N) round per propagare un aggiornamento agli N peer del sistema

  • Round (o ciclo) di gossip: intervallo di tempo in cui ogni

peer ha preso almeno una volta l’iniziativa di scambiare aggiornamenti

Valeria Cardellini – SDCC 2019/20 77

slide-40
SLIDE 40

Schema generale di un protocollo di gossiping

  • Due peer P e Q, con P che ha scelto Q per lo scambio

di dati; P è eseguito una volta ad ogni round (ogni Δ unità di tempo)

Active thread (peer P): Passive thread (peer Q): (1) selectPeer(&Q); (1) (2) selectToSend(&bufs); (2) (3) sendTo(Q, bufs);

  • ---->

(3) receiveFromAny(&P, &bufr); (4) (4) selectToSend(&bufs); (5) receiveFrom(Q, &bufr); <----- (5) sendTo(P, bufs); (6) selectToKeep(cache, bufr); (6) selectToKeep(cache, bufr); (7) processData(cache); (7) processData(cache)

  • Quali sono gli aspetti cruciali?

– La selezione dei peer – La selezione dei dati scambiati – Il processamento dei dati ricevuti

Riferimento: A.-M. Kermarrec, M. van Steen, “Gossiping in Distributed Systems”, ACM Operating System Review 41(5), Oct. 2007.

Valeria Cardellini – SDCC 2019/20 78

Implementare un protocollo di gossiping

Quali problemi specifici occorre affrontare nell’implementare un protocollo di gossiping?

  • Membership: come i peer possono conoscersi tra

loro e quanti conoscenti avere

  • Consapevolezza della rete: come fare in modo che i

collegamenti fra peer riflettano la topologia della rete, in modo da ottenere prestazioni soddisfacenti

  • Gestione dei buffer: quali informazioni scartare

quando la memoria del peer è piena

  • Filtraggio dei messaggi: come considerare l’interesse

per il messaggio da parte dei peer e ridurre la probabilità che ricevano informazioni a cui non sono interessati

Valeria Cardellini – SDCC 2019/20 79

slide-41
SLIDE 41

Gossiping vs flooding: esempio

  • La diffusione dell’informazione è l’applicazione

classica e più popolare del gossiping nei SD

– Valida alternativa rispetto al flooding

  • Nel caso di flooding

– Ogni peer che riceve il messaggio lo invia a tutti i suoi vicini (è una degenerazione del gossiping) – Il messaggio viene scartato se TTL=0

Round 1 Round 2 Round 3 Messaggi inviati: 18 Peer raggiunti: 8 su 9

Valeria Cardellini – SDCC 2019/20 80

Gossiping vs flooding: esempio

  • Nel caso di gossiping semplice

– Il messaggio viene inviato con una probabilità di gossiping p for each msg m if random(0,1) < p then send m

p p p p p p p p p p p Round 1 Round 2 Round 3 Messaggi inviati: 11 Peer raggiunti: 7 su 9

Valeria Cardellini – SDCC 2019/20 81

slide-42
SLIDE 42

Gossiping vs flooding

  • Gossiping features

– Probabilistic – Takes a localized decision but results in a global state – Lightweight – Fault tolerant

  • Flooding has some advantages

– Universal coverage and minimal state information – … but it floods the networks with redundant messages

  • Gossiping goals

– Reduce the number of redundant transmissions that occur with flooding while trying to retain its advantages – … but due to its probabilistic nature, gossiping cannot guarantee that all the peers are reached and it requires more time to complete than flooding

Valeria Cardellini – SDCC 2019/20 82

Altre applicazioni del gossiping nei SD

  • Oltre alla diffusione dell’informazione…
  • Peer sampling

– Per fornire a ciascun peer una lista di peer da contattare

  • Monitoraggio di risorse in sistemi distribuiti a larga

scala

  • Computazioni distribuite per l’aggregazione di dati, in

particolare in reti di sensori

– Computazione di valori aggregati (ad es. somma, media, massimo, quantili) – Ad es. nel caso di calcolo della media

  • Siano x0,i e x0,j i valori al tempo t=0 posseduti dai nodi i e j
  • Dopo il gossiping tra i e j usando strategia push-pull:

x1,i, x1,j ←(x0,i + x0,j)/2

Valeria Cardellini – SDCC 2019/20 83

slide-43
SLIDE 43

Two gossiping protocols

  • We now examine two examples of gossiping

protocols

– Blind counter rumor mongering – Bimodal multicast

Valeria Cardellini – SDCC 2019/20 84

Blind counter rumor mongering

  • Why that name for this gossiping protocol?

– Rumor mongering (def: “the act of spreading rumours”, also known as gossip): a node with “hot rumor” will periodically infect other nodes – Blind: loses interest regardless of the recipient (why) – Counter: loses interest after F contacts (when) A node n initiates a broadcast by sending message m to B of its neighbors, chosen at random. When node p receives a message m from node q If p has received m no more than F times p sends m to B uniformly randomly chosen neighbors that p knows have not yet seen m. – Note that p knows if its neighbor r has already seen the message m only if p has sent it to r previously, or if p received the message from r

Valeria Cardellini – SDCC 2019/20 85

slide-44
SLIDE 44

Analysis of blind counter rumor mongering

  • Difficult to obtain analytical expressions to describe

the behavior of a gossiping protocol, even for relatively simple topologies simulation analysis

  • Assume Barabási network topology:

– 1000 nodes with an average node degree of 6 – Rumor mongering vs flooding scalability (F=2, B=2)

Source: “The cost of application-level broadcast in a fully decentralized peer-to-peer network”

Valeria Cardellini – SDCC 2019/20 86

Bimodal multicast

  • Also called pbcast (probabilistic broadcast)
  • Composed by two phases:
  • 1. Message distribution phase: a process sends a

multicast with no particular reliability guarantees

  • IP multicast if available, otherwise some application-level

multicast (e.g., Scribe trees)

  • 2. Gossip repair phase: after a process receives a

message, it begins to gossip about the message to a set of peers

  • Gossip occurs at regular intervals and offers the

processes a chance to compare their states and fill any gaps in the message sequence

Source: K.P. Birman, M. Hayden, O. Ozkasap, Z. Xiao, M. Budiu, and Y. Minsky. Bimodal multicast. ACM Trans. Comput. Syst. 17, 2 (May 1999), 41-88.

Valeria Cardellini – SDCC 2019/20 87

slide-45
SLIDE 45

Bimodal multicast: message distribution

  • Start by using unreliable multicast to rapidly distribute

the message

  • But some messages may not get through, and some

processes may be faulty

  • So initial state involves partial distribution of multicast(s)

Send messages

: failed messages

p1 p2 p3 p4 p5 p6

time

Valeria Cardellini – SDCC 2019/20 88

Bimodal multicast: gossip repair

  • Periodically (e.g., every 100 ms) each process

sends a digest describing its state to some randomly selected process

  • The digest identifies messages: it does not

include them Send digests p1 p2 p3 p4 p5 p6

Valeria Cardellini – SDCC 2019/20 89

slide-46
SLIDE 46

Bimodal multicast: gossip repair (2)

  • Recipient checks the gossip digest against its
  • wn history and solicits a copy of any missing

message from the process that sent the gossip Solicit message copies p1 p2 p3 p4 p5 p6

Valeria Cardellini – SDCC 2019/20 90

Bimodal multicast: gossip repair (3)

  • Processes respond to solicitations received

during a round of gossip by retransmitting the requested message

  • Various optimizations (not examined)

Send message copies p1 p2 p3 p4 p5 p6

Valeria Cardellini – SDCC 2019/20 91

slide-47
SLIDE 47

Bimodal multicast: why “bimodal”?

  • Are there two phases?
  • Nope; description of dual “modes” of result

Pbcast bimodal delivery distribution

1.E-30 1.E-25 1.E-20 1.E-15 1.E-10 1.E-05 1.E+00 5 10 15 20 25 30 35 40 45 50

number of processes to deliver pbcast p{#processes=k}

1. pbcast is almost always delivered to most or to few processes and almost never to some processes Atomicity = almost all or almost none 2. A second bimodal characteristic is due to delivery latencies, with

  • ne distribution of very

low latencies (messages that arrive without loss in the first phase) and a second distribution with higher latencies (messages that had to be repaired in the second phase)

Either sender fails… … or data gets through with high probability Valeria Cardellini – SDCC 2019/20 92