Modeling Peer-Peer File Sharing Systems Zihui Ge, Daniel R. - - PowerPoint PPT Presentation
Modeling Peer-Peer File Sharing Systems Zihui Ge, Daniel R. - - PowerPoint PPT Presentation
Modeling Peer-Peer File Sharing Systems Zihui Ge, Daniel R. Figueiredo, Sharad Jaiswal, Jim Kurose, Don Towsley INFOCOM 2003 Outline P2P file sharing architectures Napster (centralized) Gnutella (flooding) Chord (routing)
2
Outline
P2P file sharing architectures
Napster (centralized) Gnutella (flooding) Chord (routing)
General model framework for P2P file sharing
closed queuing network model parameters model solution
Apply model to study performance
scalability, freeloaders, etc
Summary
3
user query response user generates query search phase file transfer user file request file transfer phase
User behavior in P2P file sharing
4
Internet File owner
N1 N2 N3 N5 N4 user DB
❏ Central server (cluster)
stores global index
❏ Napster
user generates query query response file request file transfer phase file transfer search phase Lookup(“LetItBe”)
Centralized Indexing Architecture (CIA)
5
Internet File owner Lookup(“LetItBe”) N1 N2 N3 N5 N4 user user generates query query response file request file transfer phase file transfer
❏ Limited-scope flooding to
locate files
❏ Gnutella
search phase
Distributed Indexing with Flooded queries Architecture (DIFA)
6
Distributed Indexing with Hash- directed queries Architecture (DIHA)
Internet File owner Lookup(“LetItBe”) N1 N2 N3 N5 N4 user user query response file request file transfer phase
RT RT RT RT RT
search phase Hash(“LetItBe”)=N4 Hash(“LetItBe”)=N4
❏ Hash-directed query
❏ Tapestry, Chord, CAN ❏ Only handles exact query
7
Internet File owner download(“LetItBe”) N1 N2 N3 N5 N4 user user generates query query response file request file transfer phase file transfer
❏ File transfers
directly between file owner and receiver search phase
File transfers
8
Modeling P2P file sharing systems: challenges
Unique workload/service model:
peers generate workload (queries, downloads)
but also add service capacity (file sharing, process query)
Complex peer behavior:
transient: off-line, on-line (inactive), on-line
(active - query, download)
different classes of peers: ■freeloaders ■service capacity
9
Query Processing
1 M p1 pM
File download Thinking
poff
Off-Line
q
On-Line
A general model
Closed loop, fixed population of peers No structural dependency on architecture
10
Query Processing
1 M
File download Thinking
poff
Off-Line
p1 pM q
On-Line
A general model w/ multiple classes
- f peers
Different classes of peers have different behaviors
11
Query Processing
1 M
p1 pM
File download Thinking
poff
Off-Line
q
Modeling query processing
Modeled by a single server queue Service rate of queue is a function of # peers on-line: μq (Na) Query failure prob. (q) associated with each file request
12
Query Processing
1 M
File download Thinking
poff
Off-Line
p1 pM q
Modeling file downloading
Associate each unique file in system with a “service capacity”
modeled by single server queue
Requests chosen w/ probability pj : rank j (by req. popularity) Service capacity μf (Na , i) is function of # replicas:
file availability: rank i (by # of replicas) # peers on-line
13
Query Processing
1 M
File download Thinking
poff
Off-Line
Capacity for downloading file w/ availability rank: i = C0 Na / i
p1 pM q
Capacity for proc. queries in CIA = C1 Capacity for proc. queries in DIHA = C3 Na/log Na
Model parameters
C0 Na / iα C0 Σc W(c) Na
(c) / i
14
Model solutions
Performance metric: expected system throughput (# files downloaded per unit of time) Approximate numerical solution
bottleneck analysis with multiple classes of peers set of non-linear equations, solved via fixed-point mostly independent of service rate functions
■ flexibility to use other functions
Simulation in more general cases
approximations validated
15
Scalability with Population
System throughput scales with population size in distributed indexing architectures
1 10 100 1000
1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 1.E+09
Total Population: N System Troughput: T
CIA DIFA DIHA
Workload:
1000 files 12 hour off-line 30 minutes on-line idle period average of 5 downloads per active period
16
Impact of Freeloaders
Freeloaders:
Do not share files Support query processing More aggressive
shorter think times double file downloads
40 80 120 160 1000 2000 3000 4000
Number of Freeloaders (in thousands)
System Throughput
CIA DIFA DIHA
100,000 non-freeloaders
P2P can support large number of freeloaders
17
Impact of Freeloaders (Cont’d)
Freeloaders impact non-freeloaders marginal effect when system not saturated
2 4 6 8 10 12 1000 2000 3000 4000 Number of Freeloaders (in thousands)
Throughput of non-freeloader CIA DIFA DIHA
100,000 non-freeloaders
P2P can support large number of freeloaders
18
Mismatch between file availability and request popularity
Each file ranked by
request popularity (j) # replicas available (i)
Randomly match ranks within a window
i-w < j < i+w
20 25 30 35 40 45 50 55 60 . 0 1 1 10 100 1000 10000 Rank permutaion window: w System throughput CIA DIFA DIHA
500,000 peers
Small mismatches have little effect; large mismatches do
What if service capacity doesn’t match popularity?
19
Supernodes (Kazaa)
Kazaa: 2-level hierarchy top-level
well provisioned
supernodes
■ higher capacity
gnutella-like
bottom-level
connect to single
supernode
200 400 600 800 1000 1200 1400 0.E+00 1.E+08 2.E+08 3.E+08 4.E+08 Total Population: N System Throughput
1:1 1:6 1:10 1:12 1:11 1:20 # supernodes : # total nodes =
Hierarchical design improves system thruput
20