Understanding the Long-Term Self-Similarity of Internet Traffic - - PowerPoint PPT Presentation

understanding the long term self similarity of internet
SMART_READER_LITE
LIVE PREVIEW

Understanding the Long-Term Self-Similarity of Internet Traffic - - PowerPoint PPT Presentation

Understanding the Long-Term Self-Similarity of Internet Traffic Steve Uhlig and Olivier Bonaventure InfoNet group University of Namur, Belgium E-mail : suhlig,obonaventure @info.fundp.ac.be URL : http://www.infonet.fundp.ac.be/


slide-1
SLIDE 1

Understanding the Long-Term Self-Similarity of Internet Traffic Steve Uhlig and Olivier Bonaventure

InfoNet group University of Namur, Belgium E-mail :

  • suhlig,obonaventure

@info.fundp.ac.be URL : http://www.infonet.fundp.ac.be/

QOFIS’2001 c

  • S. Uhlig (University of Namur, Belgium)

Page 1

slide-2
SLIDE 2

The Traffic Trace

Measurement study

  • Collect Netflow records for a Belgian ISP during

days.

  • Netflow
✂☎✄ ✆✝

: total volume between

✞✟✠ ✡ ✟

and

☛☞ ✌

(for layer-4 flows)

  • All the incoming traffic (interdomain).

Studied ISP

  • BELNET : research and government ISP

(http://www.belnet.be) – high bandwidth links to two transit ISPs, E3 link to SURFNET/AMS-IX, OC-3 link to BNIX, 1.5 DS3 links to TEN-155 – Main user : University attached to E3 backbone

QOFIS’2001 c

  • S. Uhlig (University of Namur, Belgium)

Page 2

slide-3
SLIDE 3

Total Traffic

Total Traffic

  • Granularity of the traffic records
  • 1 minute
  • Represents
✁✂✄

Tbytes of traffic

  • Average Incoming traffic : 32 Mbps [97.5% TCP]
  • 42 million flows

20 40 60 80 100 120 140 1 2 3 4 5 6 Traffic volume [Mbps] Time [days] Total traffic

QOFIS’2001 c

  • S. Uhlig (University of Namur, Belgium)

Page 3

slide-4
SLIDE 4

Self-Similarity

Definition : Let

☎✝✆ ✞ ✟ ✠

be a stationary sequence (our sample) Define the m-aggregated sequence :

☛✡ ☞ ✌ ✍✏✎ ✑ ✁ ✒ ☞ ✓ ☞ ✄ ✔ ✡ ✓✖✕ ✒ ✌ ☞ ✗ ✒
☎ ✎ ✁ ✟ ☎✙✘ ☎✙✚ ✚ ✚

Then the sequence

☞ ✌ ✁ ✂ ✛✡ ☞ ✌ ✓ ✜ ✎ ✁ ✟ ☎✙✘ ☎✙✚ ✚ ✚ ✠

is said asymptotically self-similar if

✁ ✣ ✒ ✕ ✤ ☛✡ ☞ ✌ ✥✦ ✣ ✧ ★

QOFIS’2001 c

  • S. Uhlig (University of Namur, Belgium)

Page 4

slide-5
SLIDE 5

Estimators for Self-Similarity

Used estimators

  • ✁✄✂

statistic : log-log plot gives slope

☎ ✆
  • aggregated variance : log-log plot gives slope
☎ ✁ ✆ ✝ ✁
  • correlogram : log-log plot gives slope
☎ ✁ ✆ ✝ ✁
  • periodogram : log-log plot near origin gives slope
☎ ✄ ✝ ✁ ✆

Self-Similarity is asymptotic

estimating

is tricky.

QOFIS’2001 c

  • S. Uhlig (University of Namur, Belgium)

Page 5

slide-6
SLIDE 6

Total Traffic Self-Similarity

1 10 100 16 64 256 R/S statistic (logscale) k (logscale) R/S plot slope 1 slope 0.5 1e+13 1e+14 1e+15 1e+16 1e+17 1 10 100 Variance of k-aggregated series k Aggregated variance plot slope 0 slope -1 0.01 0.1 1 1 10 100 Autocorrelation function Time lag Correlogram plot ACTIVE IP SOURCES TOTAL TRAFFIC MEAN TRAFFIC PER IP MAXIMUM 1e+14 1e+15 1e+16 1e+17 1e+18 1e+19 1 10 100 1000 Periodogram Frequency Periodogram plot for total traffic slope -0.5 slope -1

QOFIS’2001 c

  • S. Uhlig (University of Namur, Belgium)

Page 6

slide-7
SLIDE 7

Who’s who ?

Several factors can explain total traffic self-similarity :

  • heavy-tails in flows sizes (or length) : proved to be able to

generate self-similarity (Crovella and Bestavros 1996)

  • number of IP sources sending traffic : possible factor (proof

via results in stochastic processes).

  • ...

Heavy-tails often considered in the literature as THE factor for self-similarity.

QOFIS’2001 c

  • S. Uhlig (University of Namur, Belgium)

Page 7

slide-8
SLIDE 8

Heavy-Tails

Heavy-tailed distribution (persistence of large values) :

✄ ☎✝✆ ✄ ✕ ✞ ☎ ✥✦ ✄ ✧ ✟ ★ ✚

1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1 10 100 1000 10000 100000 1e+06 1e+07 P(X = value) One-minute traffic value [unit = 100 bytes] Probability mass of one-minute traffic values reference power tail, alpha = 1 reference power tail, alpha = 2

QOFIS’2001 c

  • S. Uhlig (University of Namur, Belgium)

Page 8

slide-9
SLIDE 9

Dynamics of Traffic Sources

Looking at the evolution of number of 1-minute IP addresses (same for prefixes and ASs) during the week...

1e+06 1e+08 1e+10 1e+12 1e+14 1e+16 1e+18 1e+20 10 100 1000 Variance k Aggregated variance plot TOTAL TRAFFIC MAXIMUM MEAN TRAFFIC PER IP ACTIVE IP SOURCES 0.01 0.1 1 1 10 100 Autocorrelation function Time lag Correlogram plot ACTIVE IP SOURCES TOTAL TRAFFIC MAXIMUM MEAN TRAFFIC PER IP

Damn, it’s self-similar too !

QOFIS’2001 c

  • S. Uhlig (University of Namur, Belgium)

Page 9

slide-10
SLIDE 10

The Role of Heavy-Tails

So the question is : to what extent are those heavy-tails important ?

  • sufficient condition for self-similarity
  • but to what extent are those large traffic volumes important ?

Let’s try the following experiment (or “get rid of these bursts !”):

  • Determine total amount of traffic

(in bytes) for minute

and the number

✄ ✁
  • f IP addresses sending traffic during that minute.
  • For each minute

, generate an approximation of the exponential distribution with mean

✁ ✄ ✁

so that the simulated traffic corresponds to a total of about

bytes and a number of points of about

✄ ✁

points by relying on the exponential distribution formula

☎ ✆✞✝
☛ ☞ ☛ ☛✍✌ ✎ ✡ ☛ ✏ ☞ ☛ ✑✓✒ ✂

QOFIS’2001 c

  • S. Uhlig (University of Namur, Belgium)

Page 10

slide-11
SLIDE 11

Experiment (1)

Principle : For each minute of the week, generate an (discrete) exponential distribution with

✄ ✁

values (IP sources) for a total of

bytes (1-minute traffic volume). We can do that because exponential distributions are cool : their mean (

✁ ✄ ✁

) gives it all...

foreach minute

foreach value =

to

✄ ☎ ✆ ✝ ✁

// Attributing to value its frequency of occurence frequency(value) =

✞✠✟ ✡ ✝ ☛✠☞ ✝ ✌✎✍ ✏✒✑ ✓✕✔ ☛ ✖✘✗ ☛ ✙✛✚ ✜✢✣ ✤

// Attributing to value its traffic volume volume(value) =

✥✦ ✧★ ✏ ✍ ✞✠✟ ✡ ✝ ☛✠☞ ✝ ✌ ✍ ✏✩✑ ✓✕✔ ☛ ✖✘✗ ☛ ✙✛✚ ✜✢✣ ✤ ✪ ✪

QOFIS’2001 c

  • S. Uhlig (University of Namur, Belgium)

Page 11

slide-12
SLIDE 12

Experiment (2)

Approximations due to discrete distribution:

  • cutting the tail of the 1-minute distribution (
✂ ✓

) :

✄ ✡ ✝ ☎ ✝✝✆ ✕ ✡ ✄ ✝ ✞ ☎ ✝ ✌✠✟ ✡ ☛ ✝ ✞ ✒ ✄ ✝
  • deviation for total traffic :
☞ ✍✍✌ ✓✏✎ ✟ ✡ ☛ ✝ ✑✒✓ ✔✕ ✔ ✖✘✗ ✙✚✛ ✣ ✆ ✍ ✗ ✥ ✚✛ ✆ ✑ ✑ ✜ ✌ ✓ ☞
  • deviation for IP sources :
☞ ✍✍✢ ✓ ✎ ✟ ✡ ☛ ✝ ✑ ✒ ✓ ✔✕ ✔ ✖ ✣ ✤ ✆ ✥ ✛ ✆ ✦✧ ★ ✍ ✗ ✥ ✚✛ ✆ ✑ ✑ ✜ ✢ ✓ ☞

QOFIS’2001 c

  • S. Uhlig (University of Namur, Belgium)

Page 12

slide-13
SLIDE 13

Experiment (3)

Evolution of the discretization error:

1e-05 0.0001 0.001 0.01 0.1 1 10 100 1 2 3 4 5 6 Percentage (logscale) Time [days] Relative difference for simulated total traffic sample path 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 10 100 1 2 3 4 5 6 Percentage (logscale) Time [days] Relative difference for simulated IP sources sample path

Difference in total traffic

% and in number of IP sources

%

  • n average.

A better precision would allow to reduce this already small error.

QOFIS’2001 c

  • S. Uhlig (University of Namur, Belgium)

Page 13

slide-14
SLIDE 14

Experiment (4)

Simulated traffic values for IP sources vs. original values for IP sources :

1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 1 10 100 1000 10000 100000 P(X = value) One-minute traffic value [unit = 100 bytes] Probability mass of simulated exponential 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1 10 100 1000 10000 100000 1e+06 1e+07 P(X = value) One-minute traffic value [unit = 100 bytes] Probability mass of one-minute traffic values reference power tail, alpha = 1 reference power tail, alpha = 2

We hence managed to prevent the large bursts to occur while self-similarity has not changed at all.

heavy-tails have a limited role in Internet traffic self-similarity on the long-term

QOFIS’2001 c

  • S. Uhlig (University of Namur, Belgium)

Page 14

slide-15
SLIDE 15

Conclusions

  • Interdomain traffic is self-similar on the long-term.
  • 1-minute IP sources (also prefixes and ASs) sending traffic

are self-similar too.

  • Changing relative traffic volume (limiting bursts) without

changing source dynamics leaves self-similarity unchanged.

  • Heavy-tails in volume sent by IP hosts are not THE most

important aspect of the traffic self-similarity.

  • The problem is probably stochastic, with traffic sources

driving the long-term self-similarity.

QOFIS’2001 c

  • S. Uhlig (University of Namur, Belgium)

Page 15