Comparing Hybrid Peer-to-Peer Hybrid peer-to-peer systems Systems - - PowerPoint PPT Presentation

comparing hybrid peer to peer hybrid peer to peer systems
SMART_READER_LITE
LIVE PREVIEW

Comparing Hybrid Peer-to-Peer Hybrid peer-to-peer systems Systems - - PowerPoint PPT Presentation

Comparing Hybrid Peer-to-Peer Hybrid peer-to-peer systems Systems Beverly Yang and Hector Garcia-Molina Pure peer-to-peer systems are hard to scale Gnutella Look at hybrids between p2p and server-client Presented by Marco Barreno Servers


slide-1
SLIDE 1

Comparing Hybrid Peer-to-Peer Systems

Beverly Yang and Hector Garcia-Molina

Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems

Hybrid peer-to-peer systems

Pure peer-to-peer systems are hard to scale

Gnutella

Look at hybrids between p2p and server-client

Servers will index files, clients download from each

  • ther directly

Searching can be done more efficiently on a server Napster (but Napster had its own problems...) Several other architectures

Questions for hybrid systems

Best way to organize servers? Index replication policy? What queries are submitted often? How do we deal with churn? How do query patterns affect performance?

Contributions of this paper

Presents several architectures for hybrid systems Presents and evaluates a probabilistic model for queries Compares architectures quantitatively, based on their models and the music sharing domain Compares strategies in non-music-sharing domains (a bit)

slide-2
SLIDE 2

General concepts: basic actions

Login

A client connects to a server and uploads metadata about the files it offers It is a local user to that server, a remote user to others

Query

A list of words to search on Satisfied if preset maximum number of results found

Download

Contact peer directly after getting info from server

Goal

The goal of this study is to maximize UsersPerServer What do you think of this goal?

Batch vs. incremental logins

Batch: on login/logout, user’ s entire metadata set is added/removed

Allows index to remain small, but login/logout is expensive

Incremental: metadata kept in index at all times, and only deltas are sent at login

Saves much effort on login/logout Queries become more expensive, as server must filter for online users

Architectures (1)

Chained architecture

Servers are arranged in a linear chain (ring?) Each server keeps metadata for local users Unsatisfied queries sent along chain Logins and downloads scalable; queries potentially expensive

slide-3
SLIDE 3

Architectures (2)

Full replication architecture

Each server keeps metadata about all users Logins expensive Queries cheap

Architectures (3)

Hash architecture

Metadata words hashed so a particular server is responsible for a particular subset of them Queries sent to relevant servers On login, metadata sent to all relevant servers Limited number of servers need to see each query, but sending the lists may be expensive

Architectures (4)

Unchained architecture

Servers are independent and don’ t communicate A user can only search files on the server he/she connects to Napster Disadvantage: user’ s views are limited Advantage: scales very well (as servers, users increase together)

Query model

Universe of queries: q1, q2, q3, ...; densities f, g g(i) is probability that a submitted query is query qi (query popularity) f(i) is probability that any given file will match query qi (selection power) g tells us what queries users like to submit, while f tells us which files users like to store

slide-4
SLIDE 4

Expected results for chained

ExServ = Expected number of servers needed to

  • btain R results (MaxResults)

If P(s) is the probability that exactly s servers are needed to return R or more results, we have:

ExLocalResults based on (UsersPerServer * FilesPerUser) files ExTotalResults based on (ExLocalResults * k) files

Expected values for others

ExServ trivially 1 for full replication and unchained ExServ is equivalent to balls-in-bins for hash

Distributions for f() and g()

Exponential distributions work well for music domain:

Monotonically decreasing

Popularity and selection power are correlated

Most popular has highest selection power, and so on

Validation of query model

M(n) = expected # results from n files Q(n) = probability we don’ t get R results These data gathered from OpenNap

slide-5
SLIDE 5

Performance model

CPU cycles

Cost estimates based on examination and guesswork, plus some experiments Matched OpenNap relatively well for batch logins

Inter-server bandwidth

Varies among architectures

Server-client bandwidth

Napster protocol: Login, AddFile, RemoveFile

Take min over resources (iterative estimation)

Evaluation

Metric: max users per server (throughput, not latency)

Memory requirements Beyond music

f() and g() could be different

May be no or negative correlation e.g. Adding “price > 0” to a query makes it less popular but doesn't change size of result set e.g. Archive system will return more results from farther in the past (queries presumably rarer)

No or negative correlation can be modeled by adjusting the ratio of the parameters to f and g

No: r = 1 Negative: r >> 1

slide-6
SLIDE 6

CPU performance vs. r Conclusion

Chained is the best architecture for music domain Full replication might be good with lots of cheap memory and stable network connections Incremental logins do best when there is negative correlation between f and g, and it performs best in short, bandwidth-limited sessions