Power-law revisited: A large scale measurement study of P2P content - - PowerPoint PPT Presentation

power law revisited a large scale measurement study of
SMART_READER_LITE
LIVE PREVIEW

Power-law revisited: A large scale measurement study of P2P content - - PowerPoint PPT Presentation

Power-law revisited: A large scale measurement study of P2P content popularity Gyrgy Dn Niklas Carlsson School of Electrical Engineering Department of Computer Science KTH, Royal Institute of Technology University of Calgary Stockholm,


slide-1
SLIDE 1

1

Power-law revisited: A large scale measurement study of P2P content popularity

György Dán

School of Electrical Engineering KTH, Royal Institute of Technology Stockholm, Sweden

Niklas Carlsson

Department of Computer Science University of Calgary Calgary, Canada

27 April 2010, I PTPS San Jose, CA

slide-2
SLIDE 2

2

P2P Content Popularity

  • I nstantenous popularity
  • Concurrent number of peers
  • Effectiveness of locality awareness
  • Little data available
  • Power-law ?
  • Download popularity
  • Number of peers that downloaded content
  • Effectiveness of caching
  • Several measurements
  • Power-law but flattened head (Mandelbrot-Zipf)
  • Measurements limited in time and coverage
  • How accurate are they?
  • How accurate can they be?
  • L. Guo, S. Chen, Z. Xiao, E. Tan, X. Ding, and X. Zhang,

Measurement, Analysis, and Modeling of BitTorrent-like Systems, in Proc. ACM IMC, Oct. 2005.

  • M. Hefeeda and O. Saleh,

Traffic modeling and proportional partial caching for peer-to-peer systems, IEEE/ACM Trans. on Networking,

  • vol. 16, no. 6, pp. 1447

1460, 2008.

slide-3
SLIDE 3

3

Measuring P2P Content Popularity

  • Overlay structure  Measurement methodology
  • Tracker based (BitTorrent)
  • Peer harvesting
  • Tracker query - scrape
  • Deep packet inspection
  • Unstructured (Gnutella, Ares, FastTrack)
  • Monitoring queries and replies
  • Deep packet inspection
  • Measurement = Sample of population wide popularity
  • Probability sampling - difficult
  • Opportunity sampling
  • Inference can be misleading
slide-4
SLIDE 4

4

Measurement Methodology

  • Screen scrape of Mininova.org
  • Largest torrent search engine
  • 31 Aug. 2008, 15 Oct. 2008, 31 Aug. 2009
  • Scrape URL of 1690 BitTorrent trackers
  • Scrape of 721 BitTorrent trackers (S,L,D)
  • 15 Sept. 2008 to 17 Aug. 2009
  • weekly, daily at 8pm GMT
  • Almost instantaneous (< 30mins)

Mininova scrapes Tracker scrapes Time

slide-5
SLIDE 5

5

Zipf’s Law and Beyond

  • Zipf’s Law
  • Heavy tail
  • Mandelbrot-Zipf Law
  • Flattened head
  • Generalized Zipf Law
  • Light tail
  • Flattened head
  • Power-law trunk

1

1 ( , )( ) Zipf f

f f r r

 

1

1 ( , , )( )

( )

MZipf f

f f r r

  

  

1

1 ( , , , ) (1/ )

( ) (1 / ( / ) )

GZipf f r

f f r e

     

      

1

( , )

lim ( ) 0,

ar Zipf f r

e f r a



    

1

( , , , )

lim ( ) 0,

ar GZipf f r

e f r a

  



    

slide-6
SLIDE 6

6

Zipf’s Law and Beyond - Example

10 10

2

10

4

10

6

10 10

2

10

4

10

6

Rank Popularity Zipf(1e+007,1) MZipf(1e+007,50,1) GZipf(2e+005,0.02,1e-005,1)

Head Trunk Tail

slide-7
SLIDE 7

7

What we measured (I)

  • Instantaneous popularity

10 10

2

10

4

10

6

10 10

2

10

4

10

6

Torrent rank (r) Number of peers

15.09.2008 22.09.2008 23.09.2008 17.08.2009

Power-law?

Max peers

Active swarms Total peers: 42 million

slide-8
SLIDE 8

8

What we measured (II)

  • Download popularity

10 10

2

10

4

10

6

10

8

10 10

5

10

10

Torrent rank (r) Number of downloads (15 Sept.2008 to)

17 Aug.2009 (48 weeks) 16 Mar.2009 (26 weeks) 13 Oct.2008 (4 weeks) 22 Sept.2008 (1 week)

Power-law? Power-law?

Max downloads: 50 million

Active swarms Total downloads: 8.3 billion

slide-9
SLIDE 9

9

Instantaneous Popularity

  • I nstantaneous popularity 15 Sept 2008, 8pm GMT
  • Max: 1.6x105, Total: 4.23x107, Active: 2.93x106

10 10

2

10

4

10

6

10 10

2

10

4

10

6

Torrent rank (r) Number of peers (leechers,seeds) Peers Leechers Seeds

slide-10
SLIDE 10

10

Power-law or Double-power-law?

  • I nstantaneous popularity 15 Sept 2008, 8pm GMT
  • Max: 1.6x105, Total: 4.23x107, Active: 2.93x106

10 10

2

10

4

10

6

10 10

2

10

4

10

6

Torrent rank (r) Number of peers (leechers,seeds)

Peers Leechers Seeds Zipf(1.6e+05, 0.60) Zipf(1e+06, 0.86) GZipf(1.5e+05, 0.08, 1e-06, 0.86)

Power-law trunk hypothesis:

  • Max: 106
  • Total: 6.1x107
  • Active: 9.5x106

Sampling artifact?

slide-11
SLIDE 11

11

10 10

2

10

4

10

6

10 10

2

10

4

10

6

Torrent rank (r) Number of peers

Double-Zipf fit Measured PropTor UnifTor

Sampling and Exponential cutoff

  • I nstantaneous popularity 15 Sept 2008, 8pm GMT
  • 2.93x106 samples from Double-Zipf in two ways
  • PropTor (discover torrent proportional to its popularity)
  • UnifTor (discover torrent uniform at random)

PropTor sampling introduces exponential cutoff

Total: 4.23x107 Total: 4.02x107 PMCC=0.99

slide-12
SLIDE 12

12

10 10

2

10

4

10

6

10

8

10 10

2

10

4

10

6

10

8

10

10

Torrent rank (r) Number of downloads (15 Sept.2008 to)

Measured 13 Oct.2008 Measured 17 Aug.2009

Download Popularity

  • Download popularity over 4 and 48 weeks
  • Active: 2.29x106 and 7.17x106 torrents
slide-13
SLIDE 13

13

10 10

2

10

4

10

6

10

8

10 10

2

10

4

10

6

10

8

10

10

Torrent rank (r) Number of downloads (15 Sept.2008 to)

Measured 13 Oct.2008 Zipf(1.3e+07, 0.35) Zipf(5e+08, 1.20) GZipf(8.4e+06,0.033,1.5e-06,1.20)

Measured 17 Aug.2009 Zipf(5e+07, 0.50) Zipf(5e+08, 0.95) GZipf(3.4e+07,0.06,1.1e-06,0.95)

Power-law vs. Exponential cutoff

  • Download popularity over 4 and 48 weeks
  • Active: 2.29x106 and 7.17x106 torrents

4 weeks Double power- law hypothesis:

  • Active: 1.77x107

48 weeks Double power- law hypothesis:

  • Active: 1.43x109
slide-14
SLIDE 14

14

10 10

2

10

4

10

6

10

8

10 10

2

10

4

10

6

10

8

Torrent rank (r) Number of downloads

Double-Zipf fit Measured PropTor UnifTor

Sampling and Exponential cutoff

  • Download popularity over 4 weeks (15 Sept.2008-13 Oct.2008)
  • 2.29x106 samples from Double-Zipf in two ways
  • PropTor (discover torrent proportional to its popularity)
  • UnifTor (discover torrent uniform at random)

PropTor sampling introduces exponential cutoff

Total: 1.31x109 Total: 1.21x109 PMCC=0.99

slide-15
SLIDE 15

15

10 10

2

10

4

10

6

10 10

2

10

4

10

6

Torrent rank (r) Number of peers

Original Mininova PirateBay PropTor UnifTor PropPeer

Impact of Sampling

  • Instantaneous popularity 15 Sept 2008, 8pm GMT
  • 2.93x106 active torrents, 4.23x107 total peers
  • sampled in 5 ways

Heavy-tailed

PirateBay, PropTor, UnifTor: 6.55x105 torrents PropPeer: 4.23x105 peers (1% of total)

Large torrents

  • verrepresented

Mininova: 9.7x105 torrents

slide-16
SLIDE 16

16

10 10

2

10

4

10

6

10 10

2

10

4

10

6

10

8

Torrent rank (r) Number of downloads

Original Mininova PirateBay PropTor UnifTor PropPeer

Impact of Sampling

  • Download popularity over 4 weeks
  • 2.29x106 active torrents, 1.31x109 total downloads
  • Sampled in 5 ways

Heavy-tailed

PirateBay, PropTor, UnifTor: 1.69x106 torrents PropPeer: 1.31x106 peers (0.1% of total)

Large torrents

  • verrepresented

Mininova: 4.95x105 active torrents

slide-17
SLIDE 17

17

Summary

  • Large measurement study of P2P content popularity
  • Instantaneous popularity
  • Download popularity
  • Instantaneous popularity
  • Power-law head?, power-law trunk
  • Tail may be power-law
  • Download popularity
  • Flat head, power-law trunk
  • Tail may be power-law for short periods
  • Not power-law for long periods
  • Sampling and measured characteristics
  • Infer with care
slide-18
SLIDE 18

18

Power-law revisited: A large scale measurement study of P2P content popularity

György Dán

School of Electrical Engineering KTH, Royal Institute of Technology Stockholm, Sweden

Niklas Carlsson

Department of Computer Science University of Calgary Calgary, Canada

27 April 2010, I PTPS San Jose, CA