feedex collaborative exchange of news feeds
play

FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque - PowerPoint PPT Presentation

FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology Motivation RSS/Atom feeds have become increasingly popular Published by most


  1. FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology

  2. Motivation • RSS/Atom feeds have become increasingly popular – Published by most traditional media and blogs • Scalability of feed servers – Frequent pull requests create high load – Infrequent requests increase latency and may lead to missed items • Our Approach – Use resources at peer nodes to deliver feed items – Scalable growth in resources with service demand • Challenges – Peers may not fully cooperate and execute the agreed protocols

  3. FeedEx Overview • Feeds have different update and usage patterns. – A new hybrid transport mechanism – Pull from servers – Push among peer nodes • Peers in FeedEx – Form a distribution mesh, – Fetch feeds from web servers occasionally, and – Exchange new entries among each other – Peer incentives for exchanging entries

  4. RS S / Atom Primer • Feed format <feed> <title>NYT Technology</title> <!-- other elements --> <entry> <title>Basics: Going Wireless on ...</title> <link>http://www.nytimes.com/2006/05/18/...</link> <summary>Wi-Fi has revolutionized the...</summary> <!-- other elements --> </entry> <!-- more entries --> </feed> • Current way of reading feeds – Stand-alone applications (e.g., Mozilla Thunderbird) – Web-based service (e.g., Bloglines and My Yahoo!)

  5. Analysis of Feed Publishing • Purpose – Interesting by itself and helpful in designing FeedEx • Methodology – 245 popular feeds monitored for 10 days – Feeds fetched every 2 minutes

  6. Publishing Rate by Rank BBC(U) ABC ● BBC(W) ● ● ● ● ● ● Reuters ● ● ● ● 50.0 CNN ● Entries published per day (log scale) ● ● ● ● ● ● ● ● ● ● ● ●●● Fark.com ●●●●●●●●●●●●●● Yahoo(T) ● ● ● ● ● ● ● ● ● ● ● ● Yahoo(E) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Yahoo(M) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● 0.1 ● 1 2 5 10 20 50 100 200 R ank (log scale)

  7. Entry Count 79 159 0 4 0 3 s d e e f f o 0 2 r e b m Rotten Tomatoes Techbargains.com u N MSDN EurekAlert EurekAlert Washington Post 0 1 Techbargains.com MSNBC Slate MacInTouch 0 0 40 80 120 0 20 40 60 80 100 Mean of entry count Range of entry count

  8. Publishing Rate by Time R euters 25 10 0 ntries published per hour Yahoo(M) 10 5 0 Motley Fool 8 4 E 0 12 NPR 8 4 0 Sat Sun 0 1 2 3 4 5 6 7 Time (day)

  9. Entry Lifetime 1.0 CNN 0.8 Beta News Cumulative probability 0.6 FOX News 0.4 Techbargains.com 0.2 0.0 0 20 40 60 80 Lifetime (hours)

  10. Architecture of FeedEx To News Feed Servers Feed Fetch Scheduler RPC Neighbor Server Connector To List Server From Neighbors To Neighbors

  11. Bootstrapping • Obtain a list of peers – Dedicated list server (Gnutella and BitTorrent) – Embedding (Pseudoserving [Kong and Ghosal 1999] and CoopNet [Padmanabhan and Sripanidkulchai 2002] ) – Local cache • Connect to peers 1. Establish connection 2. Exchange subscription sets: { (url,hop),...}

  12. Neighbor S election • Metrics for good neighbors – Subscription set match ∑ − = h u Q ( ) w d i i ∈ ∩ i ( S S ' ) P Q – Topological proximity – Duration of relationship

  13. Adaptive Fetching from S ervers • Coordinated fetching by peers – High coordination overhead – Lots of nodes with high churn rate • Solution: Adaptive fetching – Freshness rate f : Fraction of new entries in a fetched document – Set a target freshness rate f t – Fetching interval is doubled or halved, bounded by T min and T max

  14. Entry Exchange Among Peers • New entries obtained – By fetching from web servers – From neighbors • Entry bundle – A set of new entries – Document identifier (did): Assigned by SHA-1 digest – Flooded to matching neighbors • Two-phase flooding – check_did(did) call: 344 bytes including HTTP request header – put_entries(bundle)

  15. Incentive Mechanism • Pairwise fairness is simple and effective – Uses local information only – Easy to implement and enforce the mechanism • Contribution metric c j,i : hf c j,i + = w f • Deficit of contribution d i,j : d i,j = c i,j c j,i • Node i ensures d i,j < D for every neighbor j and a parameter D .

  16. Prototype Implementation • Python: python.org • XML-RPC: xmlrpc.com/spec • Twisted: twistedmatrix.com • SQLite: sqlite.org • Universal Feed Parser: feedparser.org

  17. Experimental S etup • Two modes – Stand-alone applications: sln – FeedEx: xch • Metrics – Time lag – Missing entries – Communication cost • Experiments – Use 189 PlanetLab nodes – Run 22 hours on a weekday – Primary factor: 6 fetching intervals – Let each node subscribe 20 out of 70 feeds

  18. Results: Time Lag 8 6 ● Time lag (hours) SLN 4 ● 2 ● ● XCH ● ● ● ● ● ● ● ● 0 0 5 10 15 Fetching interval (hours)

  19. Results: Missing Entries 100 80 XCH miss XCH gain Missing entries (%) SLN miss ● SLN gain 60 ● 40 ● ● ● ● 20 ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● .5 1 2 4 8 16 Fetching interval (hours)

  20. Results: Communication Cost 16 check_did put_entries ● Received calls per miniute 12 ● ● ● 8 ● ● 4 ● ● ● ● ● ● 0 .5 1 2 4 8 16 Fetching interval (hours)

  21. Advantages • Server scalability • Archivability • Controllability • Filtering and recommendation • Privacy

  22. Related Work • News feed delivery – Corona (Cornell) – FeedTree (Rice) • Web caching and CDN [Freedman et al. 2004, Wang et al. 2004] • Gossip-based protocols [Birman et al. 1999, Ganesh et al. 2003, Eugster et al. 2003]

  23. Conclusions • A new transport mechanism for news feeds – Pull by and exchange among peers • FeedEx encourages cooperation by enforcing pair-wise fairness, while achieving – Reduced feed server load – Low latency – High coverage – Low communication overhead

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend