Modeling Peer-Peer File Sharing Systems Zihui Ge, Daniel R. - - PowerPoint PPT Presentation

modeling peer peer file sharing systems
SMART_READER_LITE
LIVE PREVIEW

Modeling Peer-Peer File Sharing Systems Zihui Ge, Daniel R. - - PowerPoint PPT Presentation

Modeling Peer-Peer File Sharing Systems Zihui Ge, Daniel R. Figueiredo, Sharad Jaiswal, Jim Kurose, Don Towsley INFOCOM 2003 Outline P2P file sharing architectures Napster (centralized) Gnutella (flooding) Chord (routing)


slide-1
SLIDE 1

Modeling Peer-Peer File Sharing Systems

Zihui Ge, Daniel R. Figueiredo, Sharad Jaiswal, Jim Kurose, Don Towsley INFOCOM 2003

slide-2
SLIDE 2

2

Outline

 P2P file sharing architectures

 Napster (centralized)  Gnutella (flooding)  Chord (routing)

 General model framework for P2P file sharing

 closed queuing network  model parameters  model solution

 Apply model to study performance

 scalability, freeloaders, etc

 Summary

slide-3
SLIDE 3

3

user query response user generates query search phase file transfer user file request file transfer phase

User behavior in P2P file sharing

slide-4
SLIDE 4

4

Internet File owner

N1 N2 N3 N5 N4 user DB

❏ Central server (cluster)

stores global index

❏ Napster

user generates query query response file request file transfer phase file transfer search phase Lookup(“LetItBe”)

Centralized Indexing Architecture (CIA)

slide-5
SLIDE 5

5

Internet File owner Lookup(“LetItBe”) N1 N2 N3 N5 N4 user user generates query query response file request file transfer phase file transfer

❏ Limited-scope flooding to

locate files

❏ Gnutella

search phase

Distributed Indexing with Flooded queries Architecture (DIFA)

slide-6
SLIDE 6

6

Distributed Indexing with Hash- directed queries Architecture (DIHA)

Internet File owner Lookup(“LetItBe”) N1 N2 N3 N5 N4 user user query response file request file transfer phase

RT RT RT RT RT

search phase Hash(“LetItBe”)=N4 Hash(“LetItBe”)=N4

❏ Hash-directed query

❏ Tapestry, Chord, CAN ❏ Only handles exact query

slide-7
SLIDE 7

7

Internet File owner download(“LetItBe”) N1 N2 N3 N5 N4 user user generates query query response file request file transfer phase file transfer

❏ File transfers

directly between file owner and receiver search phase

File transfers

slide-8
SLIDE 8

8

Modeling P2P file sharing systems: challenges

Unique workload/service model:

 peers generate workload (queries, downloads)

but also add service capacity (file sharing, process query)

Complex peer behavior:

 transient: off-line, on-line (inactive), on-line

(active - query, download)

 different classes of peers: ■freeloaders ■service capacity

slide-9
SLIDE 9

9

Query Processing

1 M p1 pM

File download Thinking

poff

Off-Line

q

On-Line

A general model

 Closed loop, fixed population of peers  No structural dependency on architecture

slide-10
SLIDE 10

10

Query Processing

1 M

File download Thinking

poff

Off-Line

p1 pM q

On-Line

A general model w/ multiple classes

  • f peers

 Different classes of peers have different behaviors

slide-11
SLIDE 11

11

Query Processing

1 M

p1 pM

File download Thinking

poff

Off-Line

q

Modeling query processing

 Modeled by a single server queue  Service rate of queue is a function of # peers on-line: μq (Na)  Query failure prob. (q) associated with each file request

slide-12
SLIDE 12

12

Query Processing

1 M

File download Thinking

poff

Off-Line

p1 pM q

Modeling file downloading

 Associate each unique file in system with a “service capacity”

 modeled by single server queue

 Requests chosen w/ probability pj : rank j (by req. popularity)  Service capacity μf (Na , i) is function of # replicas:

 file availability: rank i (by # of replicas)  # peers on-line

slide-13
SLIDE 13

13

Query Processing

1 M

File download Thinking

poff

Off-Line

Capacity for downloading file w/ availability rank: i = C0 Na / i

p1 pM q

Capacity for proc. queries in CIA = C1 Capacity for proc. queries in DIHA = C3 Na/log Na

Model parameters

C0 Na / iα C0 Σc W(c) Na

(c) / i

slide-14
SLIDE 14

14

Model solutions

 Performance metric: expected system throughput (# files downloaded per unit of time)  Approximate numerical solution

 bottleneck analysis with multiple classes of peers  set of non-linear equations, solved via fixed-point  mostly independent of service rate functions

■ flexibility to use other functions

 Simulation in more general cases

 approximations validated

slide-15
SLIDE 15

15

Scalability with Population

System throughput scales with population size in distributed indexing architectures

1 10 100 1000

1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 1.E+09

Total Population: N System Troughput: T

CIA DIFA DIHA

Workload:

 1000 files  12 hour off-line  30 minutes on-line idle period  average of 5 downloads per active period

slide-16
SLIDE 16

16

Impact of Freeloaders

Freeloaders:

 Do not share files  Support query processing  More aggressive

 shorter think times  double file downloads

40 80 120 160 1000 2000 3000 4000

Number of Freeloaders (in thousands)

System Throughput

CIA DIFA DIHA

100,000 non-freeloaders

P2P can support large number of freeloaders

slide-17
SLIDE 17

17

Impact of Freeloaders (Cont’d)

 Freeloaders impact non-freeloaders  marginal effect when system not saturated

2 4 6 8 10 12 1000 2000 3000 4000 Number of Freeloaders (in thousands)

Throughput of non-freeloader CIA DIFA DIHA

100,000 non-freeloaders

P2P can support large number of freeloaders

slide-18
SLIDE 18

18

Mismatch between file availability and request popularity

 Each file ranked by

 request popularity (j)  # replicas available (i)

 Randomly match ranks within a window

 i-w < j < i+w

20 25 30 35 40 45 50 55 60 . 0 1 1 10 100 1000 10000 Rank permutaion window: w System throughput CIA DIFA DIHA

500,000 peers

Small mismatches have little effect; large mismatches do

What if service capacity doesn’t match popularity?

slide-19
SLIDE 19

19

Supernodes (Kazaa)

Kazaa:  2-level hierarchy  top-level

 well provisioned

supernodes

■ higher capacity

 gnutella-like

 bottom-level

 connect to single

supernode

200 400 600 800 1000 1200 1400 0.E+00 1.E+08 2.E+08 3.E+08 4.E+08 Total Population: N System Throughput

1:1 1:6 1:10 1:12 1:11 1:20 # supernodes : # total nodes =

Hierarchical design improves system thruput

slide-20
SLIDE 20

20

Summary

Simple models: insights into fundamental performance questions of P2P file sharing systems

 compare different architectures  scalability on peer population  impact of freeloaders  impact of imbalance of file availability and request

popularity

Model extensions:

 hierarchical peer structure  off-line to on-line transition phase