FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque - - PowerPoint PPT Presentation

feedex collaborative exchange of news feeds
SMART_READER_LITE
LIVE PREVIEW

FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque - - PowerPoint PPT Presentation

FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology Motivation RSS/Atom feeds have become increasingly popular Published by most


slide-1
SLIDE 1

FeedEx: Collaborative Exchange of News Feeds

Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology

slide-2
SLIDE 2

Motivation

  • RSS/Atom feeds have become increasingly

popular

– Published by most traditional media and blogs

  • Scalability of feed servers

– Frequent pull requests create high load – Infrequent requests increase latency and may lead to missed items

  • Our Approach

– Use resources at peer nodes to deliver feed items – Scalable growth in resources with service demand

  • Challenges

– Peers may not fully cooperate and execute the agreed protocols

slide-3
SLIDE 3

FeedEx Overview

  • Feeds have different update and usage patterns.

– A new hybrid transport mechanism – Pull from servers – Push among peer nodes

  • Peers in FeedEx

– Form a distribution mesh, – Fetch feeds from web servers occasionally, and – Exchange new entries among each other – Peer incentives for exchanging entries

slide-4
SLIDE 4

RS S / Atom Primer

  • Feed format

<feed> <title>NYT Technology</title> <!-- other elements --> <entry> <title>Basics: Going Wireless on ...</title> <link>http://www.nytimes.com/2006/05/18/...</link> <summary>Wi-Fi has revolutionized the...</summary> <!-- other elements --> </entry> <!-- more entries --> </feed>

  • Current way of reading feeds

– Stand-alone applications (e.g., Mozilla Thunderbird) – Web-based service (e.g., Bloglines and My Yahoo!)

slide-5
SLIDE 5

Analysis of Feed Publishing

  • Purpose

– Interesting by itself and helpful in designing FeedEx

  • Methodology

– 245 popular feeds monitored for 10 days – Feeds fetched every 2 minutes

slide-6
SLIDE 6

Publishing Rate by Rank

  • ● ● ● ● ●
  • ● ● ● ● ●●●
  • R

ank (log scale) Entries published per day (log scale) 1 2 5 10 20 50 100 200 0.1 0.5 5.0 50.0

Reuters CNN Yahoo(T) Yahoo(E) BBC(U) Fark.com Yahoo(M) ABC BBC(W)

slide-7
SLIDE 7

Entry Count

Mean of entry count 79 40 80 120 1 2 3 4 s d e e f f

  • r

e b m u N

Rotten Tomatoes MSDN EurekAlert Techbargains.com Slate

Range of entry count 159 20 40 60 80 100

Techbargains.com EurekAlert Washington Post MSNBC MacInTouch

slide-8
SLIDE 8

Publishing Rate by Time

10 25 R euters 5 10 Yahoo(M) 4 8 Motley Fool 4 8 12 NPR 1 2 3 4 5 6 7 Sat Sun Time (day) E ntries published per hour

slide-9
SLIDE 9

Entry Lifetime

Lifetime (hours) Cumulative probability 20 40 60 80 0.0 0.2 0.4 0.6 0.8 1.0

CNN FOX News Techbargains.com Beta News

slide-10
SLIDE 10

Architecture of FeedEx

To News Feed Servers To Neighbors Neighbor Server RPC From Neighbors To List Server Connector Feed Fetch Scheduler

slide-11
SLIDE 11

Bootstrapping

  • Obtain a list of peers

– Dedicated list server (Gnutella and BitTorrent) – Embedding (Pseudoserving [Kong and Ghosal 1999] and CoopNet [Padmanabhan and Sripanidkulchai 2002]) – Local cache

  • Connect to peers

1. Establish connection 2. Exchange subscription sets: { (url,hop),...}

slide-12
SLIDE 12

Neighbor S election

  • Metrics for good neighbors

– Subscription set match – Topological proximity – Duration of relationship

( ' )

( )

i P Q

h i i S S

u Q w d

− ∈ ∩

=

slide-13
SLIDE 13

Adaptive Fetching from S ervers

  • Coordinated fetching by peers

– High coordination overhead – Lots of nodes with high churn rate

  • Solution: Adaptive fetching

– Freshness rate f : Fraction of new entries in a fetched document – Set a target freshness rate ft – Fetching interval is doubled or halved, bounded by Tmin and Tmax

slide-14
SLIDE 14

Entry Exchange Among Peers

  • New entries obtained

– By fetching from web servers – From neighbors

  • Entry bundle

– A set of new entries – Document identifier (did): Assigned by SHA-1 digest – Flooded to matching neighbors

  • Two-phase flooding

– check_did(did) call: 344 bytes including HTTP

request header

– put_entries(bundle)

slide-15
SLIDE 15

Incentive Mechanism

  • Pairwise fairness is simple and effective

– Uses local information only – Easy to implement and enforce the mechanism

  • Contribution metric cj,i:

cj,i + = wf

hf

  • Deficit of contribution di,j:

di,j = ci,j cj,i

  • Node i ensures di,j < D for every neighbor j and

a parameter D.

slide-16
SLIDE 16

Prototype Implementation

  • Python: python.org
  • XML-RPC: xmlrpc.com/spec
  • Twisted: twistedmatrix.com
  • SQLite: sqlite.org
  • Universal Feed Parser: feedparser.org
slide-17
SLIDE 17

Experimental S etup

  • Two modes

– Stand-alone applications: sln – FeedEx: xch

  • Metrics

– Time lag – Missing entries – Communication cost

  • Experiments

– Use 189 PlanetLab nodes – Run 22 hours on a weekday – Primary factor: 6 fetching intervals – Let each node subscribe 20 out of 70 feeds

slide-18
SLIDE 18

Results: Time Lag

Fetching interval (hours) Time lag (hours) 5 10 15 2 4 6 8

  • SLN

XCH

slide-19
SLIDE 19

Fetching interval (hours) Missing entries (%) .5 1 2 4 8 16 20 40 60 80 100

  • XCH miss

XCH gain SLN miss SLN gain

Results: Missing Entries

slide-20
SLIDE 20

Results: Communication Cost

Fetching interval (hours) Received calls per miniute .5 1 2 4 8 16 4 8 12 16

  • check_did

put_entries

slide-21
SLIDE 21

Advantages

  • Server scalability
  • Archivability
  • Controllability
  • Filtering and recommendation
  • Privacy
slide-22
SLIDE 22

Related Work

  • News feed delivery

– Corona (Cornell) – FeedTree (Rice)

  • Web caching and CDN [Freedman et al.

2004, Wang et al. 2004]

  • Gossip-based protocols [Birman et al.

1999, Ganesh et al. 2003, Eugster et al. 2003]

slide-23
SLIDE 23

Conclusions

  • A new transport mechanism for news

feeds

– Pull by and exchange among peers

  • FeedEx encourages cooperation by

enforcing pair-wise fairness, while achieving

– Reduced feed server load – Low latency – High coverage – Low communication overhead