Approche Algorithmique des Syst` emes Distribu es (AASR) - - PowerPoint PPT Presentation

approche algorithmique des syst emes distribu es aasr
SMART_READER_LITE
LIVE PREVIEW

Approche Algorithmique des Syst` emes Distribu es (AASR) - - PowerPoint PPT Presentation

Approche Algorithmique des Syst` emes Distribu es (AASR) Guillaume Pierre guillaume.pierre@irisa.fr Dapr` es un jeu de transparents de Maarten van Steen VU Amsterdam, Dept. Computer Science 04b: Communication (2/2) Contents Chapter


slide-1
SLIDE 1

Approche Algorithmique des Syst` emes Distribu´ es (AASR)

Guillaume Pierre

guillaume.pierre@irisa.fr

D’apr` es un jeu de transparents de Maarten van Steen VU Amsterdam, Dept. Computer Science

04b: Communication (2/2)

slide-2
SLIDE 2

Contents

Chapter 01: Introduction 02: Architectures 03: Processes 04: Communication 04: Communication (1/2) 04: Communication (2/2) 05: Naming 06: Synchronization 07: Consistency & Replication 08: Fault Tolerance 09: Security

2 / 39

slide-3
SLIDE 3

Multicast communication

Application-level multicasting Gossip-based data dissemination

3 / 39

slide-4
SLIDE 4

Application-level multicasting

Essence Organize nodes of a distributed system into an overlay network and use that network to disseminate data. Chord-based tree building

1

Initiator generates a multicast identifier mid.

2

Lookup succ(mid), the node responsible for mid.

3

Request is routed to succ(mid), which will become the root.

4

If P wants to join, it sends a join request to the root.

5

When request arrives at Q: Q has not seen a join request before ⇒ it becomes forwarder; P becomes child of Q. Join request continues to be forwarded. Q knows about tree ⇒ P becomes child of Q. No need to forward join request anymore.

4 / 39

slide-5
SLIDE 5

ALM: Some costs

A B D C Ra Rb Rd Rc Internet Router End host Overlay network

7 5 1 1 1 1 30 40

Re

20

Link stress: How often does an ALM message cross the same physical link? Example: message from A to D needs to cross Ra,Rb twice. Stretch: Ratio in delay between ALM-level path and network-level path. Example: messages B to C follow path of length 71 at ALM, but 47 at network level ⇒ stretch = 71/47.

5 / 39

slide-6
SLIDE 6

Epidemic Algorithms

General background Update models Removing objects

6 / 39

slide-7
SLIDE 7

Principles

Basic idea Assume there are no write–write conflicts: Update operations are performed at a single server A replica passes updated state to only a few neighbors Update propagation is lazy, i.e., not immediate Eventually, each update should reach every replica Two forms of epidemics Anti-entropy: Each replica regularly chooses another replica at random, and exchanges state differences, leading to identical states at both afterwards Gossiping: A replica which has just been updated (i.e., has been contaminated), tells a number of other replicas about its update (contaminating them as well).

7 / 39

slide-8
SLIDE 8

Anti-entropy

Principle operations A node P selects another node Q from the system at random. Push: P only sends its updates to Q Pull: P only retrieves updates from Q Push-Pull: P and Q exchange mutual updates (after which they hold the same information). Observation For push-pull it takes O(log(N)) rounds to disseminate updates to all N nodes (round = when every node as taken the initiative to start an exchange).

8 / 39

slide-9
SLIDE 9

Anti-entropy: analysis (extra)

Basics Consider a single source, propagating its update. Let pi be the probability that a node has not received the update after the i-th cycle. Analysis: staying ignorant

With pull, pi+1 = (pi)2: the node was not updated during the i-th cycle and should contact another ignorant node during the next cycle. With push, pi+1 = pi(1− 1

N )N(1−pi) ≈ pie−1 (for small pi and large

N): the node was ignorant during the i-th cycle and no updated node chooses to contact it during the next cycle. With push-pull: (pi)2 ·(pie−1)

9 / 39

slide-10
SLIDE 10

Push vs. Pushpull

Let’s add 500 nodes, all with the same initial central neighbor Question Why did we omit the pull protocols?

10 / 39

slide-11
SLIDE 11

Anti-entropy in large-scale distributed systems

How can each node in the system randomly select on of its neighbors? Centralized list of nodes ⇒ not scalable (because of the query traffic) Fully-replicated list of nodes ⇒ not scalable (because of the update traffic) Conclusion: let’s build a distributed system for that... :-)

11 / 39

slide-12
SLIDE 12

Epidemic overlay management

Traditional Each node has a complete view of the network Nodes periodically exchange data with a randomly-selected node Decentralized Each node has a partial view of the network (small, fixed size) Nodes periodically exchange links with a random node (from their partial view).

Randomly ⇒ random network Methodically ⇒ structure

12 / 39

slide-13
SLIDE 13

Randomized overlays

Each node’s view contains a set of (truly) random nodes from the network

Periodically refreshed

Important property: Random node from a view == random node from the network

So we can apply “traditional” gossip

13 / 39

slide-14
SLIDE 14

CYCLON: one possible way to build a random overlay

Each node keeps a fixed-size list of neighbors (e.g., 20)

Each neighbor is tagged with the date we last saw him alive

Periodically, each node selects one node out of its neighbors

Pick the oldest peer out of its view

Exchange some references with this peer

And add a reference to itself

14 / 39

slide-15
SLIDE 15

Average path length

Path length = a good measure of the time and cost to flood the network

15 / 39

slide-16
SLIDE 16

Clustering coefficient

Coefficient == probability that two neighbors of the same node are also neighbors of each other A large clustering coefficient is:

Bad for flooding (many redundant messages) Bad for self-healing (each strongly connected cluster has

  • nly few links to other clusters)

16 / 39

slide-17
SLIDE 17

Clustering coefficient

17 / 39

slide-18
SLIDE 18

In-degree distribution

The in-degree distribution affects:

Robustness (because of weakly-connected nodes) Load balancing Way epidemics spread

18 / 39

slide-19
SLIDE 19

Self-healing

19 / 39

slide-20
SLIDE 20

Self-healing

20 / 39

slide-21
SLIDE 21

Creating structure with unstructured overlays

“Unstructured” P2P overlays are very good at creating structure: Define a global proximity function on nodes

Each node will link with nodes which minimize cost(self,node)

Each time two nodes gossip, they:

Merge their entire views Keep only the “best” nodes according to the proximity function

21 / 39

slide-22
SLIDE 22

Maintaining connectivity

Imagine an overlay where each node connects only to similar nodes

E.g., similar music taste

This may break connectivity

How can new nodes find their place in the overlay? How can the structure evolve if I change my tastes?

22 / 39

slide-23
SLIDE 23

Dual-layer gossiping

VICINITY aggressively searches for the best nodes

It also picks nodes from the underlying CYCLON layer

CYCLON keeps the overlay connected

It learns uniformly random nodes

23 / 39

slide-24
SLIDE 24

Example: build a torus

24 / 39

slide-25
SLIDE 25

Example: build a torus

25 / 39

slide-26
SLIDE 26

Example: build a torus

26 / 39

slide-27
SLIDE 27

Example: build a torus

27 / 39

slide-28
SLIDE 28

Let’s use Vicinity to search for content

eDonkey2000 traces:

12,000 nodes and their list of files In total: 970,000 unique files

Each node initially knows 5 random others Goal: connect each node with its 10 closest neighbors

Similarity metric: # of files shared by both A and B

28 / 39

slide-29
SLIDE 29

Vicinity performance

Using Vicinity but not Cyclon:

29 / 39

slide-30
SLIDE 30

Vicinity performance

Let’s have Vicinity periodically exchange a few Vicinity links with a random neighbor (taken from Cyclon)

30 / 39

slide-31
SLIDE 31

Vicinity performance

Let’s have Vicinity periodically exchange the best Vicinity links with a random neighbor

31 / 39

slide-32
SLIDE 32

Vicinity performance

Let’s have Vicinity periodically exchange the best links (also from Cyclon) with a random neighbor

32 / 39

slide-33
SLIDE 33

Aggregation

Aggregation is the collective name of a set of functions that provide statistical information about a system. Useful in large-scale distributed systems

The average load of nodes in a cloud The sum of free space in a distributed storage system The total number of nodesin a P2P system

Solutions should be:

Decentralized Robust to churn

33 / 39

slide-34
SLIDE 34

Churn

All large-scale systems have churn: nodes join and leave all the time

34 / 39

slide-35
SLIDE 35

Gossip-based aggregation

35 / 39

slide-36
SLIDE 36

Example: average estimation

Each node contains a state: a number representing the value to be averaged selectPeer(): random selection among current neighbors update(sa,sb) = sa+sb

2

Observations:

1

After each exchange the system’s average does not change

2

The variance is reduced

3

Therefore: if the system is connected then each node will converge toward the global average.

36 / 39

slide-37
SLIDE 37

A run of the protocol

37 / 39

slide-38
SLIDE 38

Exercises

1 How does the topology influence the convergence speed? 2 Which topology is optimal for convergence speed? 3 What are the effects of link failures on the protocol? 4 What are the effects of node failures on the protocol? 5 Devise a protocol that measures the (approximate) number

  • f nodes in the system

38 / 39

slide-39
SLIDE 39

Counting

The counting protocol is based on average calculation

Initialization: one node starts with 1, all others with 0 The average value will converge towards 1/N

Problem: how to select that “one node?”

Concurrent instances of the counting protocol Each instance is led by a different node Messages are tagged with a unique identifier Nodes participate in all instances for a duration T Each node self-elects itself as a leader with probability P = c/Nest

c is the interval (in cycles) at which we want to estimate N

39 / 39