Evidence for long-tailed distributions in the Internet Allen B. - - PowerPoint PPT Presentation

evidence for long tailed distributions in the internet
SMART_READER_LITE
LIVE PREVIEW

Evidence for long-tailed distributions in the Internet Allen B. - - PowerPoint PPT Presentation

Evidence for long-tailed distributions in the Internet Allen B. Downey Wellesley College p.1 Self-Similarity No shortage of explanations... ON/OFF model M/G/ model Protocol models p.2 ccdf test Samples ccdf


slide-1
SLIDE 1

Evidence for long-tailed distributions in the Internet

Allen B. Downey Wellesley College

– p.1

slide-2
SLIDE 2

Self-Similarity

No shortage of explanations...

  • ON/OFF model
  • M/G/

model

  • Protocol models

– p.2

slide-3
SLIDE 3

ccdf test

5 10 15 20 log2 (x) 1 0.1 0.01 0.001 1e-4 Prob {X > x} ccdf test

lognormal sample pareto sample

  • Samples

(n=10,000) from Pareto and lognormal distributions with similar tail behavior.

– p.3

slide-4
SLIDE 4

File sizes on a Web server

1KB 32KB 1MB 32MB File size (bytes) 1 1/4 1/16 1/64 1/256 1/1024 1/4096 1/16384 Prob {file size > x} File Sizes from Calgary dataset

lognormal model pareto model actual ccdf

  • Sizes of 15,160

files at the University of Calgary.

  • By conventional

goodness of fit, Pareto wins.

  • Tail behavior is

not long-tailed.

– p.4

slide-5
SLIDE 5

Numerical differentiation

1/16 1/64 1/256 1/1024 1/4096 P (X > x)

  • 2
  • 1.5
  • 1
  • 0.5

Inverse slope Estimated derivative of ccdf

  • Numerical

derivatives are noisy.

  • Testing for

trends is robust.

  • Tail curvature =

0.141, p-value

0.001.

– p.5

slide-6
SLIDE 6

Files sizes on another Web server

1KB 32KB 1MB 32MB File size (bytes) 1 1/4 1/16 1/64 1/256 1/1024 1/4096 1/16384 Prob {file size > x} File Sizes from Saskatchewan dataset

lognormal model actual ccdf

  • Files sizes from

University of Saskatchewan.

  • Pareto model fits

well.

  • Two-mode

lognormal model fits well.

  • Tail curvature

test is no help.

– p.6

slide-7
SLIDE 7

Interarrival times, TCP packets

.001 .01 0.1 1 10 100 1000 10^4

x (seconds)

1 0.1 0.01 0.001 10^-4 10^-5 10^-6

Prob {time > x} TCP packet interarrival times

Pareto model lognormal model actual cdf

  • 4 million

interarrivals from LBL and DEC datasets.

  • Very consistent

between datasets.

  • Some signs of

straightness.

  • Extreme tail hard

to characterize.

– p.7

slide-8
SLIDE 8

Interarrival times, TCP connections

0.1 1 10 100 1000

x (seconds)

1 0.1 0.01 0.001 10^-4 10^-5 10^-6

Prob {time > x} TCP connection interarrival times

Pareto model lognormal model Weibull model actual cdf

  • 782,000

connections in LBL CONN-7.

  • Feldmann

reports that Weibull fits the bulk.

  • Fits the tail well,

too.

– p.8

slide-9
SLIDE 9

Interarrival times, web requests

.001 .01 0.1 1s 10 100 10^4 10^6

x (seconds)

1 0.1 0.01 0.001 10^-4 10^-5 10^-6

Prob {time > x} Web request interarrival times

Pareto model lognormal model actual cdf

  • 135,000

requests from instrumented browsers at BU.

  • Hard to

characterize tail behavior.

– p.9

slide-10
SLIDE 10

http transfer times

0.1 1 10 100 1000

x (seconds)

1 0.1 0.01 0.001 10^-4 10^-5

Prob {time > x} Web request transfer times

Pareto model lognormal model actual cdf

  • 135,000

transfers.

  • Lognormal

model fits the extreme tail.

– p.10

slide-11
SLIDE 11

Throughput

10 100 1000 10^4 10^5 x (bytes/second) 0.0 0.2 0.4 0.6 0.8 1.0 Prob {throughput > x} Throughputs, BU dataset

lognormal model actual ccdf

  • For each

transfer, divide size by transfer time.

  • Across paths

and time, throughput is roughly lognormal.

– p.11

slide-12
SLIDE 12

ftp transfer times

0.1 1 10 100 1000 10^4 10^5 10^6

x (seconds)

1 0.1 0.01 0.001 10^-4 10^-5

Prob {time > x} FTP transfer times

Pareto model lognormal model actual ccdf

  • 105,000

transfers in LBL CONN-7.

  • Not so clear that

this is lognormal.

  • Paxson used

two-stage Pareto model.

– p.12

slide-13
SLIDE 13

ftp throughput

10 100 1000 10^4 10^5 x (bytes/second) 0.0 0.2 0.4 0.6 0.8 1.0 Prob {throughput > x} Throughputs, LBL dataset

lognormal model actual ccdf

  • Again, roughly

lognormal.

  • Top end

compressed by hw limitations.

– p.13

slide-14
SLIDE 14

ftp burst sizes

1 32 1KB 32KB 1MB 32MB x (bytes) 1 1/4 1/16 1/64 1/256 1/1024 1/4096 1/16384 Prob {size > x} ftp burst sizes

lognormal model Pareto model actual cdf

  • Two or more

transfers with

4s between.

  • 56,000 bursts.
  • Fairly convincing

straight line.

– p.14

slide-15
SLIDE 15

ftp burst lengths

1 10 100 1000 x (# transfers) 1 1/4 1/16 1/64 1/256 2^-10 2^-12 2^-14 Prob {length > x} ftp burst lengths

lognormal model Pareto model actual cdf

  • How many

transfers in a burst?

  • 85% are

singletons.

  • Lognormal?

Pareto?

– p.15

slide-16
SLIDE 16

http burst lengths

1 10 100 1000 x (# transfers) 1 1/32 2^-10 2^-15 2^-20 Prob {length > x} http burst length, Trace A

lognormal model Pareto model actual cdf

  • 456,000 http

connections from Charzinski trace.

  • 70% of

connections make a single request.

  • Tail behavior

hard to characterize.

– p.16

slide-17
SLIDE 17

More http burst lengths

1 10 100 1000 x (# transfers) 1 1/32 2^-10 2^-15 2^-20 Prob {length > x} http burst length, Trace B

lognormal model Pareto model actual cdf

  • 739,000

connections recorded by Charzinski.

  • Pretty clearly

lognormal.

– p.17