Efficient and Highly Available Peer Discovery: A Case for - - PowerPoint PPT Presentation

efficient and highly available peer discovery a case for
SMART_READER_LITE
LIVE PREVIEW

Efficient and Highly Available Peer Discovery: A Case for - - PowerPoint PPT Presentation

Efficient and Highly Available Peer Discovery: A Case for Independent Trackers and Gossiping Gyrgy Dn Ilias Chatzidrossos Niklas Carlsson Royal Institute of Technology (KTH) Linkping University Stockholm, Sweden Linkping, Sweden


slide-1
SLIDE 1

Efficient and Highly Available Peer Discovery: A Case for Independent Trackers and Gossiping

György Dán Ilias Chatzidrossos

Royal Institute of Technology (KTH) Stockholm, Sweden

  • Proc. IEEE P2P, Kyoto, Japan, Aug/Sept. 2011

Niklas Carlsson

Linköping University Linköping, Sweden

slide-2
SLIDE 2

Background

BitTorrent

 Arguably biggest source of p2p traffic  Contents split into many small pieces

 Pieces are downloaded from both leechers and seeds

 Distribution paths are dynamically determined

 Based on data availability

 At least one overlay per content

slide-3
SLIDE 3

Background

Peer discovery in BitTorrent

 Torrent file

 “announce” URL

 Tracker

 Register torrent file  Maintain state information

 Peers

 Obtain torrent file  Announce  Report status  Peer exchange (PEX)

 Issues

 Central point of failure  Tracker load

Swarm = Torrent

slide-4
SLIDE 4

Background

Peer discovery in BitTorrent

 Torrent file

 “announce” URL

 Tracker

 Register torrent file  Maintain state information

 Peers

 Obtain torrent file  Announce  Report status  Peer exchange (PEX)

 Issues

 Central point of failure  Tracker load

Swarm = Torrent

slide-5
SLIDE 5

Background

Multi-tracked torrents

 Torrent file

 “announce-list” URLs

 Trackers

 Register torrent file  Maintain state information

 Peers

 Obtain torrent file  Choose one tracker at random  Announce  Report status  Peer exchange (PEX)

 Issue

 Multiple smaller swarms

SwarmTorrent SwarmTorrent

slide-6
SLIDE 6

Background

Multi-tracked torrents

 Torrent file

 “announce-list” URLs

 Trackers

 Register torrent file  Maintain state information

 Peers

 Obtain torrent file  Choose one tracker at random  Announce  Report status  Peer exchange (PEX)

 Issue

 Multiple smaller swarms

SwarmTorrent SwarmTorrent

slide-7
SLIDE 7

Background

Multi-tracked torrents

 Torrent file

 “announce-list” URLs

 Trackers

 Register torrent file  Maintain state information

 Peers

 Obtain torrent file  Choose one tracker at random  Announce  Report status  Peer exchange (PEX)

 Issue

 Multiple smaller swarms

SwarmTorrent SwarmTorrent

slide-8
SLIDE 8

Scalable … Why an issue??

BitTorrent efficiency vs. swarm size

Early analytical model

k

N N         log 1 

10 20 30 40 50 0.95 1 1.05

1/ Number of neighboring peers

  • D. Qiu, R. Srikant, “Modeling and Performance

Analysis of BitTorrent-Like Peer-to-Peer Networks”,

  • Proc. ACM SIGCOMM, 2004
slide-9
SLIDE 9

Scalable … Why an issue??

BitTorrent efficiency vs. swarm size

Early analytical model

pieces neighboring peers

efficiency

k

N N         log 1 

10 20 30 40 50 0.95 1 1.05

1/ Number of neighboring peers

  • D. Qiu, R. Srikant, “Modeling and Performance

Analysis of BitTorrent-Like Peer-to-Peer Networks”,

  • Proc. ACM SIGCOMM, 2004
slide-10
SLIDE 10

Scalable … Why an issue??

BitTorrent efficiency vs. swarm size

Early analytical model

pieces neighboring peers

efficiency

k

N N         log 1 

10 20 30 40 50 0.95 1 1.05

1/ Number of neighboring peers

  • D. Qiu, R. Srikant, “Modeling and Performance

Analysis of BitTorrent-Like Peer-to-Peer Networks”,

  • Proc. ACM SIGCOMM, 2004
slide-11
SLIDE 11

Scalable … Why an issue??

BitTorrent efficiency vs. swarm size

Early analytical model

pieces neighboring peers

efficiency

k

N N         log 1 

10 20 30 40 50 0.95 1 1.05

1/ Number of neighboring peers

  • D. Qiu, R. Srikant, “Modeling and Performance

Analysis of BitTorrent-Like Peer-to-Peer Networks”,

  • Proc. ACM SIGCOMM, 2004

Early measurements

  • X. Yang, G. de Veciana,”Service Capacity
  • f Peer to Peer Networks,”
  • Proc. IEEE INFOCOM 2004

Measured time to transmit 1KB, based on 500 torrents

slide-12
SLIDE 12

Scalable … Why an issue??

BitTorrent efficiency vs. swarm size

Early analytical model

pieces neighboring peers

efficiency

  • X. Yang, G. de Veciana,”Service Capacity
  • f Peer to Peer Networks,”
  • Proc. IEEE INFOCOM 2004

Measured time to transmit 1KB, based on 500 torrents

k

N N         log 1 

10 20 30 40 50 0.95 1 1.05

1/ Number of neighboring peers

  • D. Qiu, R. Srikant, “Modeling and Performance

Analysis of BitTorrent-Like Peer-to-Peer Networks”,

  • Proc. ACM SIGCOMM, 2004

Early measurements

slide-13
SLIDE 13

Scalable … Why an issue??

BitTorrent efficiency vs. swarm size

Early analytical model

pieces neighboring peers

efficiency

  • X. Yang, G. de Veciana,”Service Capacity
  • f Peer to Peer Networks,”
  • Proc. IEEE INFOCOM 2004

Measured time to transmit 1KB, based on 500 torrents

k

N N         log 1 

10 20 30 40 50 0.95 1 1.05

1/ Number of neighboring peers

  • D. Qiu, R. Srikant, “Modeling and Performance

Analysis of BitTorrent-Like Peer-to-Peer Networks”,

  • Proc. ACM SIGCOMM, 2004

Early measurements

slide-14
SLIDE 14

Measurements

Two basic datasets

 Screen scrapes of www.mininova.org

 Popular torrent search engine  1,690 trackers (721 unique)

 Tracker scrapes of known trackers (Oct. 10-17, 2008)

 2.86 million unique torrents  Roughly 20-60 M concurrent peers (depending on day)  330,000 swarms overlap with screen scrape

slide-15
SLIDE 15

Throughput vs. swarm size

 Throughput estimation

LT FD

slide-16
SLIDE 16

Throughput vs. swarm size

 Throughput estimation

LT FD

file size downloads (during period) time period number of leecher

slide-17
SLIDE 17

Throughput vs. swarm size

 Throughput estimation

10 10

1

10

2

10

3

10

4

10

5

5 10 15 20 25 30 35 40 45

Number of peers in swarm [xt,r] Estimated swarm throughput/leecher [KB/s] S/L 4 1 S/L<4 S/L<1 0700 UTC 11-12.Oct.2008

LT FD

file size time period number of leecher downloads (during period)

slide-18
SLIDE 18

Throughput vs. swarm size

 Throughput estimation

10 10

1

10

2

10

3

10

4

10

5

5 10 15 20 25 30 35 40 45

Number of peers in swarm [xt,r] Estimated swarm throughput/leecher [KB/s] S/L 4 1 S/L<4 S/L<1 0700 UTC 11-12.Oct.2008

LT FD

The performance

  • f small swarms

is worse

file size time period number of leecher downloads (during period)

slide-19
SLIDE 19

Dynamic Swarm Management

Improving BitTorrent performance

 Trade-off in multi-tracking

 Load sharing and increased availability  Smaller swarm sizes  lower throughput

 Goals of dynamic swarm management

 Efficient peer discovery

 Avoid swarm partitioning (performance penalty)

 High availability

 Independent trackers  Load balancing (for large torrents)

 Small overhead

 Management traffic (at trackers and peers)

slide-20
SLIDE 20

Candidate approaches

 Tracker-based protocol

 Requires trackers to be modified (e.g., DSM)

 Torrent-wide DHT

 Consistency and stale routing tables under churn  Overhead

 Peer-based protocols

 Independent trackers and gossiping  Transparent to the trackers  Constant overhead independent of torrent size

G.Dán, N.Carlsson, “Dynamic Swarm Management for Improved BitTorrent Performance”,

  • Proc. of IPTPS 2009
slide-21
SLIDE 21

Candidate approaches

 Tracker-based protocol

 Requires trackers to be modified (e.g., DSM)

 Torrent-wide DHT

 Consistency and stale routing tables under churn  Overhead

 Peer-based protocols

 Independent trackers and gossiping  Transparent to the trackers  Constant overhead independent of torrent size

G.Dán, N.Carlsson, “Dynamic Swarm Management for Improved BitTorrent Performance”,

  • Proc. of IPTPS 2009
slide-22
SLIDE 22

What have we learned so far?

 Good peer discovery mechanisms important

 Small torrents bad ...

 Centralized peer discovery (single central tracker)

 Single point of failure  No load balancing opportunities

 Multi-tracker approach

 Connect with all trackers => High overhead  Connect with one tracker => Disjoint sets (smaller swarms)

slide-23
SLIDE 23

Main question addressed

Is possible to achieve highly available and efficient peer-discovery, which avoids the formation of disjoint swarms, at low overhead by employing independent trackers and relying only on a gossip protocol?

slide-24
SLIDE 24

Two protocols

 Random Peer Migration (RPM)  Random Multi-Tracking (RMT)

slide-25
SLIDE 25

Randomized Peer Migration (RPM)

 Slightly Modified BitTorrent peer behavior  Component 1: Peer migration

 Randomly chosen peer changes swarm  Intensity of migration () [non trivial]

 Component 2: Peer EXchange Protocol (PEX)

 Peers exchange neighborhood info using

gossiping

slide-26
SLIDE 26

Random Multi-Tracking (RMT)

 Slightly Modified BitTorrent peer behavior  Component 1: Multi-tracked Peers

 Random arriving peer connects to k trackers  Intensity of multi-tracking () [non trivial]

 Component 2: Peer EXchange Protocol (PEX)

 Multi-tracked peers exchange neighborhood info

using gossiping

slide-27
SLIDE 27

Random Multi-Tracking (RMT)

 Slightly Modified BitTorrent peer behavior  Component 1: Multi-tracked Peers

 Random arriving peer connects to k trackers  Intensity of multi-tracking () [non trivial]

 Component 2: Peer EXchange Protocol (PEX)

 Multi-tracked peers exchange neighborhood info

using gossiping

slide-28
SLIDE 28

Peer migration (using RPM)

How to pick a good migration rule??

slide-29
SLIDE 29

Peer migration (using RPM)

How to pick a good migration rule??

Make choice after downloaded of the file Migration probability

slide-30
SLIDE 30

Peer migration (using RPM)

How to pick a good migration rule??

1 (| ( ) | 1) R t   Make choice after downloaded of the file Parameter: scrape intensity (overhead) ~  Migration probability

r

x 1

slide-31
SLIDE 31

Peer migration (using RPM)

How to pick a good migration rule??

Rate out of a swarm r

1 (| ( ) | 1) R t   Migration probability

r

x 1

Make choice after downloaded of the file Parameter: scrape intensity (overhead) ~ 

slide-32
SLIDE 32

Peer migration (using RPM)

How to pick a good migration rule??

Rate out of a swarm r

1 (| ( ) | 1) R t  

r r r

x v t R x 1 ) 1 | ) ( (|  

Migration probability

r

x 1

Make choice after downloaded of the file Parameter: scrape intensity (overhead) ~ 

slide-33
SLIDE 33

Peer migration (using RPM)

How to pick a good migration rule??

Rate out of a swarm r

1 (| ( ) | 1) R t  

r r r

x v t R x 1 ) 1 | ) ( (|  

Peers in swarm Migration probability

r

x 1

Download rate Make choice after downloaded of the file Parameter: scrape intensity (overhead) ~ 

slide-34
SLIDE 34

Peer migration (using RPM)

How to pick a good migration rule??

Rate out of a swarm r

1 (| ( ) | 1) R t  

r r r

x v t R x 1 ) 1 | ) ( (|  

Peers in swarm Migration probability

r

x 1

Download rate

Note: Independent

  • f swarms size

Make choice after downloaded of the file Parameter: scrape intensity (overhead) ~ 

slide-35
SLIDE 35

Peer migration (using RPM)

How to pick a good migration rule??

Rate out of a swarm r

Similarly, average in rate from trackers r’R(t)\{r}

Destination chosen uniform at random

1 (| ( ) | 1) R t  

, ' , ' ' ( )\ ' ( )\

1 (| ( ) | 1) | ( ) | 1

t r t r r R t r r R t r

R t R t   

 

  

 

r r r

x v t R x 1 ) 1 | ) ( (|  

Peers in swarm Migration probability

r

x 1

Download rate

Note: Independent

  • f swarms size

Make choice after downloaded of the file Parameter: scrape intensity (overhead) ~ 

slide-36
SLIDE 36

Peer migration (using RPM)

How to pick a good migration rule??

Rate out of a swarm r

Similarly, average in rate from trackers r’R(t)\{r}

Destination chosen uniform at random

1 (| ( ) | 1) R t  

, ' , ' ' ( )\ ' ( )\

1 (| ( ) | 1) | ( ) | 1

t r t r r R t r r R t r

R t R t   

 

  

 

r r r

x v t R x 1 ) 1 | ) ( (|  

Peers in swarm

Note: Rates equal when download rates in swarms are equal

Migration probability

r

x 1

Download rate

Note: Independent

  • f swarms size

Make choice after downloaded of the file Parameter: scrape intensity (overhead) ~ 

slide-37
SLIDE 37

Mixing Performance

Virtual swarm size

Fraction internal and external ( ) peers known in swarm

Average virtual swarm size

Without swarm management

, , ' ' ( )\{ } , t r t r r R t r t r t

x y M x

 

, t r

y

, , ( )

1

t t r t r r R t t

M x M x

2 , ( )

1 | ( ) |

t r t r R t t

x M x R t

       

Load balancing

slide-38
SLIDE 38

RPM Protocol Performance

Lower bound under exponential assumption (holding,migration)

η share of peers implements RPM, look at tracker r

External peers known time z after last migration

Renewal-reward process

, ( ) z t r

y z pe 

i i

{(J ,R ):i 0} 

, ' 1

[ ] [ ]

i t r i i

E R y p E J J   

   

,

1

t r

y        

2 2

0, f f        

[ ] 1

h z h i

E R pe dz e dh p

 

  

  

        

 

1

1 [ ]

i i

E J J 

 

slide-39
SLIDE 39

RPM Protocol Performance

Lower bound under exponential assumption (holding,migration)

η share of peers implements RPM, look at tracker r

External peers known time z after last migration

Renewal-reward process

, ( ) z t r

y z pe 

i i

{(J ,R ):i 0} 

, ' 1

[ ] [ ]

i t r i i

E R y p E J J   

   

,

1

t r

y        

2 2

0, f f        

[ ] 1

h z h i

E R pe dz e dh p

 

  

  

        

 

1

1 [ ]

i i

E J J 

 

slide-40
SLIDE 40

Mixing efficiency (RPM)

Swarm imbalance

|R(t)|=2, η=1,=ν,p=∞

1 1 2

, , , t r t r t r

x x x   

0.2 0.4 0.6 0.8 1 0.5 0.6 0.7 0.8 0.9 1 Swarm imbalance () Virtual swarm size (Mt)

=8 =4 =2 =1 =0

slide-41
SLIDE 41

Mixing efficiency (RPM)

Swarm imbalance

|R(t)|=2, η=1,=ν,p=∞

1 1 2

, , , t r t r t r

x x x   

0.2 0.4 0.6 0.8 1 0.5 0.6 0.7 0.8 0.9 1 Swarm imbalance () Virtual swarm size (Mt)

=8 =4 =2 =1 =0

slide-42
SLIDE 42

Mixing efficiency (RPM)

Swarm imbalance

|R(t)|=2, η=1,=ν,p=∞

1 1 2

, , , t r t r t r

x x x   

0.2 0.4 0.6 0.8 1 0.5 0.6 0.7 0.8 0.9 1 Swarm imbalance () Virtual swarm size (Mt)

=8 =4 =2 =1 =0

Willingness to mix

Note: Diminishing returns

slide-43
SLIDE 43

Mixing efficiency (RPM)

Swarm imbalance (limited peer memory)

|R(t)|=2, η=4,=ν,p=50 (model and simulations ‘x’)

0.2 0.4 0.6 0.8 1 0.7 0.8 0.9 1 Swarm imbalance () Virtual swarm size (Mt)

xt=50 xt=100 xt=150 xt=200

slide-44
SLIDE 44

Mixing efficiency (RPM)

Swarm imbalance (limited peer memory)

|R(t)|=2, η=4,=ν,p=50 (model and simulations ‘x’)

0.2 0.4 0.6 0.8 1 0.7 0.8 0.9 1 Swarm imbalance () Virtual swarm size (Mt)

xt=50 xt=100 xt=150 xt=200

Note: Balanced no longer worst case

slide-45
SLIDE 45

Mixing Efficiency

RPM vs torrent size (analytic + simulations)

|R(t)|=2, η=1,=ν,p=50

50 100 150 200 0.5 0.6 0.7 0.8 0.9 1

Torrent size (xt) Mean virtual swarm size (M

t)

=8 =1

Model

  • Exp. lifetime, Exp. download
  • Exp. lifetime, Normal download

Pareto lifetime, Exp. download Pareto lifetime, Normal download Download time distribution matters (=1)

slide-46
SLIDE 46

|R(t)|=2, η=1,=ν,p=50

Mixing Efficiency

RPM vs torrent size (experiments rTorrent)

slide-47
SLIDE 47

Case study

BitTorrent measurements

 Most swarms are small

 Power-law: Long tail of

moderately popular files

 99% of swarms smaller

than 200 peers; half of the peers

G.Dán, N.Carlsson „Power-law revisited: A large scale measurement study of P2P content popularity”,

  • Proc. of IPTPS 2010

 Many torrents consist of

several swarms

 ~350.000 (small) multi-

tracked torrents

slide-48
SLIDE 48

Throughput improvement

RPM/RMT with parameters (p, η, /ν)

 Substantial improvement  Close to upper bound  Decreasing marginal gain in 

slide-49
SLIDE 49

Throughput improvement

RPM/RMT with parameters (p, η, /ν)

 Substantial improvement  Close to upper bound  Decreasing marginal gain in 

slide-50
SLIDE 50

Throughput improvement

RPM/RMT with parameters (p, η, /ν)

 Torrents with <300 peers  Average throughput gain similar across days

slide-51
SLIDE 51

Summary of Contributions

 Two distributed protocols for swarm management

 Independent trackers  Gossip protocol  Constant overhead, independent of swarm size

 Analytical model (based on renewal theory)

 Simulations and experiments validate the model

 Large-scale measurement evaluations

 The performance of small swarms is worse  Most swarms are small  Many torrents consist of several swarms  Assess potential throughput gains

slide-52
SLIDE 52

Thank you!

Niklas Carlsson (niklas.carlsson@liu.se)

Efficient and Highly Available Peer Discovery: A Case for Independent Trackers and Gossiping