Ken Birman i Cornell University. CS5410 Fall 2008. Gossip and - - PowerPoint PPT Presentation

ken birman i
SMART_READER_LITE
LIVE PREVIEW

Ken Birman i Cornell University. CS5410 Fall 2008. Gossip and - - PowerPoint PPT Presentation

Ken Birman i Cornell University. CS5410 Fall 2008. Gossip and Network Overlays A topic that has received a lot of recent attention Today well look at three representative approaches Scribe, a topic based pub sub system that


slide-1
SLIDE 1

i Ken Birman

Cornell University. CS5410 Fall 2008.

slide-2
SLIDE 2

Gossip and Network Overlays

A topic that has received a lot of recent attention Today we’ll look at three representative approaches

Scribe, a topic‐based pub‐sub system that runs on the

Pastry DHT (slides by Anne‐Marie Kermarrec)

Sienna a content subscription overlay system (slides by Sienna, a content‐subscription overlay system (slides by

Antonio Carzaniga)

T‐Man, a general purpose system for building complex

network overlays (slides by Ozalp Babaoglu)

slide-3
SLIDE 3

Scribe

Research done by the Pastry team, at MSR lab in

Cambridge England B i id i i l

Basic idea is simple

Topic‐based publish/subscribe Use topic as a key into a DHT Use topic as a key into a DHT

Subscriber registers with the “key owner” Publisher routes messages through the DHT owner

Optimization to share load

If a subscriber is asked to forward a subscription, it doesn’t do

so and instead makes note of the subscription Later it will so and instead makes note of the subscription. Later, it will forward copies to its children

slide-4
SLIDE 4

Architecture

SCRIBE

Scalable communication service

Subscription management Event notification

PASTRY

P2P location and routing layer

DHT

TCP/IP

Internet

20/12/2002 4

slide-5
SLIDE 5

Design

Construction of a multicast tree based on the Pastry

network

R h f di

Reverse path forwarding Tree used to disseminate events

Use of Pastry route to create and join groups Use of Pastry route to create and join groups

20/12/2002 5

slide-6
SLIDE 6

SCRIBE: Tree Management

Create: route to

groupId J i Id

Id Root join( groupId)

Join: route to groupId Tree: union of Pastry

routes from members

groupId j ( g p )

Forwards two copies

routes from members to the root.

Multicast: from the

d h

Multicast (groupId)

root down to the leaves Low link stress Low link stress Low delay

20/12/2002 6

join( groupId)

slide-7
SLIDE 7

SCRIBE: Tree Management

d467c4: root d467c4: root d471f1 26b20d d467c4: root

Proximity space

d13da3

y p

65a1fc 65a1fc

20/12/2002 7

Name space 26b20d 65a1fc d13da3

slide-8
SLIDE 8

Concerns?

Pastry tries to exploit locality but could these links

send a message from Ithaca… to Kenya… to Japan… Wh if l d f il S b ib i

What if a relay node fails? Subscribers it serves

will be cut off

They refresh subscriptions but unclear how often this They refresh subscriptions, but unclear how often this

has to happen to ensure that the quality will be good

(Treat subscriptions as “leases” so that they evaporate if

not refreshed… no need to unsubscribe…)

slide-9
SLIDE 9

SCRIBE: Failure Management

Reactive fault tolerance Tolerate root and nodes failure Tree repair: local impact

Fault detection: heartbeat messages

l i

Local repair

20/12/2002 9

slide-10
SLIDE 10

Scribe: performance

1500 groups, 100,000 nodes, 1msg/group Low delay penalty

G d titi i d l d b l i

Good partitioning and load balancing

Number of groups hosted per node : 2.4 (mean) 2

(median)

Reasonable link stress:

Mean msg/link : 2.4 (0.7 for IP)

M i li k *IP

Maximum link stress: 4*IP

20/12/2002 10

slide-11
SLIDE 11

Topic distribution

Windows Update

  • up Size

Stock Alert

Gro

Instant Messaging Alert

20/12/2002 11

Topic Rank

slide-12
SLIDE 12

Concern about this data set

Synthetic, may not be terribly realistic

In fact we know that subscription patterns are usually

l di t ib ti th t’ bl power‐law distributions, so that’s reasonable

But unlikely that the explanation corresponds to a clean

Zipf‐like distribution of this nature (indeed, totally p ( , y implausible)

Unfortunately, this sort of issue is common when

l i bi i i l i evaluating very big systems using simulations

Alternative is to deploy and evaluate them in use… but

  • nly feasible if you own Google‐scale resources!
  • nly feasible if you own Google scale resources!
slide-13
SLIDE 13

Delay penalty

1250 1500

f Topics

1000

e Number of

Mean = 1.66 Median =1.56

500 750

Cumulative

250

C

20/12/2002 13

0.5 1 1.5 2 2.5 3 3.5 4 4.5

Delay Penalty Relative to IP

slide-14
SLIDE 14

Node stress: 1500 topics

es Mean = 6.2 er of node Median =2 Numbe

20/12/2002 14

Total number of children table entries

slide-15
SLIDE 15

Scribe Scribe

Link stress

30000 35000 40000

Scribe IPMulticast

Mean = 1.4 Median = 0

20000 25000

er of Links

5000 10000 15000

Numbe

Maximum stress

1 10 100 1000 10000

Link stress

20/12/2002 15

Link stress

slide-16
SLIDE 16

Anycast

Supports highly dynamic groups Suitable for decentralized resource discovery (can add

predicate during DFS) predicate during DFS)

Results (100k nodes/.5M network):

Join: 4.1 msgs (empty group); avg 3.5 msgs (2,500 members)

t ( t ) (

1,000 anycasts: 4.1 msg (empty group); avg 2.3 msgs (2,500

members)

Locality: For >90% of anycasts, <7% of member were closer than the

receiver receiver

20/12/2002 16

slide-17
SLIDE 17

Fireflies

Fireflies ppt Fireflies.ppt

slide-18
SLIDE 18

T‐Man

T‐Man T Man