Internet Traffic Analysis: Mohammed Alasmar Co Cosen ener ers, - - PowerPoint PPT Presentation

internet traffic analysis
SMART_READER_LITE
LIVE PREVIEW

Internet Traffic Analysis: Mohammed Alasmar Co Cosen ener ers, - - PowerPoint PPT Presentation

19 Internet Traffic Analysis: Mohammed Alasmar Co Cosen ener ers, , 2019 2019 https://ieeexplore.ieee.org/document/8737483 Motivations Reliable traffic modelling is important for network planning, deployment and management;


slide-1
SLIDE 1

Internet Traffic Analysis:

Co Cosen ener ers, , 2019 2019 Mohammed Alasmar

https://ieeexplore.ieee.org/document/8737483

‘19

slide-2
SLIDE 2

2

§ Reliable traffic modelling is important for network planning, deployment and management; e.g. (1) network dimensioning, (2) traffic billing. § Historically, network traffic has been widely assumed to follow a Gaussian distribution. § Deciding whether Internet flows could be heavy-tailed became important as this implies significant departures from Gaussianity.

Motivations

slide-3
SLIDE 3

Traffic volumes at different T

§ 𝑌" : the amount of traffic seen in the time period [𝑗𝑈, (𝑗 + 1)𝑈) § Aggregation at different sampling times (T)

Internet trace.pacp

slide-4
SLIDE 4

4

Traffic volumes at different T

300 600 900

Time (seconds)

10 20 30 40 50

Data rate (Mbps)

300 600 900

Time (seconds)

10 20 30 40 50

Data rate (Mbps)

T = 5 sec T = 1 sec T = 10 msec

slide-5
SLIDE 5

Goal

5

10 20 30 40

Data rate (Mbps)

0.05 0.1

PDF

T = 10 ms T = 1 sec T = 5 sec

Goal

§ Investigating the distribution of the amount of traffic per unit time using a robust statistical approach.

slide-6
SLIDE 6

6

§ Investigating the distribution of the amount of traffic per unit time using a robust statistical approach. Goal

slide-7
SLIDE 7

7

Goal

T = 10 ms T = 1 sec T = 5 sec

slide-8
SLIDE 8

8

Dataset #Traces Twente1 40 MAWI2 107 Auckland3 25 Waikato4 30 Caida5 27

[1] https://www.simpleweb.org/wiki/index.php/Traces , 2009. [2] http://mawi.wide.ad.jp/mawi/ , 2016-2018. [3] https://wand.net.nz/wits/auck/9/ , 2009. [4] https://wand.net.nz/wits/waikato/8/ , 2010-2011. [5] http://www.caida.org/data/overview/ , 2016.

§ We study a large number of traffic traces (230) from many different networks: 2009 à 2018

Datasets

slide-9
SLIDE 9

9

§ Our analysis is based on the framework proposed in: § The framework combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov–Smirnov statistic and likelihood ratios.

Power-law test

slide-10
SLIDE 10

10

Power-law test

𝑞 𝑦 = 𝛽 − 1 xmin 𝑦 xmin

01

20 40 60 80 100 120

data rate (bps)

106 500 1000 1500 2000 2500 3000

PDF

xmin

Power-law distribution:

𝑞 𝑦 = 𝑦 01

introduce new variables

𝛽: scaling exponent

  • is
slide-11
SLIDE 11

11

Power-law test

slide-12
SLIDE 12

12

𝑺, 𝑞 = 𝑔𝑗𝑢. 𝒆𝒋𝒕𝒖𝒔𝒋𝒄𝒗𝒖𝒋𝒑𝒐𝑫𝒑𝒏𝒒𝒃𝒔𝒇(𝑞𝑝𝑥𝑓𝑠𝑚𝑏𝑥, 𝑏𝑚𝑢𝑓𝑠𝑜𝑏𝑢𝑗𝑤𝑓)

Likelihood Ratio: 𝑺

  • If 𝑺 > 0, then the power-law is favoured.
  • If 𝑺 < 0, then the alternative is favoured.
  • If 𝑞 < 0.1 , then the value of 𝑺 can be trusted.
  • Weibull
  • Lognormal
  • Exponential

Likelihood ratio: 𝑺 =

PQ PR = ∏TUQ

V

WQ X ∏TUQ

V

WR(X)

power-law likelihood function alternative likelihood function

§ 𝑴og−Likelihood ratio: 𝑺

slide-13
SLIDE 13

13

Normalised Log-Likelihood Ratio (LLR) T=100 msec

5 10 15 20 25

Rank of trace

  • 50
  • 40
  • 30
  • 20
  • 10

10

Normalised LLR

5 10 15 20 25

Rank of trace

  • 50
  • 40
  • 30
  • 20
  • 10

10

Normalised LLR

5 10 15 20 25

Rank of trace

  • 50
  • 40
  • 30
  • 20
  • 10

10

Normalised LLR

Weibull Lognormal Exponential

Weibull Lognormal

Circled points p > 0.1 The log- normal is the best fit for the vast majority

  • f traces.

(𝑺)

slide-14
SLIDE 14

10 20 30 40

Rank of trace

  • 60
  • 40
  • 20

20

Normalised LLR

Weibull Lognormal Exponential

14

5 10 15 20 25 30

Rank of trace

  • 30
  • 20
  • 10

10 20 30

Normalised LLR

Weibull Lognormal Exponential

Waikato traces

60 70 80 90 100

Rank of trace

  • 5

5 10

Normalised LLR

Weibull Lognormal Exponential

MAWI traces Twente traces

5 10 15 20 25

Rank of trace

  • 50
  • 40
  • 30
  • 20
  • 10

10

Normalised LLR

Weibull Lognormal Exponential

Auckland traces The log-normal distribution is not the best fit for …

Anomalous traces

The log-normal distribution is the best fit for the vast majority of traces.

  • 1 out of 25 Auckland traces
  • 9 out of 107 MAWI traces
  • 1 out of 27 CAIDA traces
  • 2 out of 30 Waikato traces
  • 5 out of 40 Twente traces
slide-15
SLIDE 15

15

Anomalous traces

§ Anomalous traces are a poor fit for all distributions tried. § This is often due to traffic outages or links that hit maximum capacity.

500 1000

Data rate (Mbps)

0.005 0.01 0.015 0.02

PDF

500 1000

Data rate (Mbps)

0.01 0.02 0.03

PDF

Log-normal trace Anomalous trace

slide-16
SLIDE 16

16

Normalised Log-Likelihood Ratio (LLR) test results for all studied traces and log-normal distribution at different timescales

5 10 15 20 25

Rank of trace

  • 20
  • 15
  • 10
  • 5

Normalised LLR

T = 5 sec T = 1 sec T = 100 msec T = 5 msec

CAIDA traces

𝑺 < 0, i.e., log-normal is favoured.

𝑺

At different sampling times: T

slide-17
SLIDE 17

17

The correlation coefficient test

5 10 15 20 25

Rank of Traces

0.7 0.8 0.9

T=5sec T=1sec T=100msec T=5msec

5 10 15 20 25

Rank of Traces

0.9 0.95 1

T=5sec T=1sec T=100msec T=5msec CAIDA traces CAIDA traces

Gaussian Log-normal

§ Strong goodness-of-fit (GOF) is assumed to exist when the value of 𝛿 is greater than 0.95.

slide-18
SLIDE 18

18

§ Bandwidth provisioning approach provides the link by the essential bandwidth that guarantees the required performance. §

  • Overprovisioning. In the conventional

methods the bandwidth is allocated by up-grading the link bandwidth to 30%

  • f the average traffic value.

Use case 1: Bandwidth provisioning

slide-19
SLIDE 19

19

𝑄 𝐵 𝑈 𝑈 ≥ 𝐷 ≤ 𝜁

§ The following inequality (the ‘link transparency formula’) has been used for bandwidth provisioning: i.e., the probability that the captured traffic A T over a specific aggregation timescale T is larger than the link capacity C has to be smaller than the value of a performance criterion ε.

ü 𝛇 has to be chosen carefully by the network

provider in order to meet the specified SLA.

Use case 1: Bandwidth provisioning

slide-20
SLIDE 20

MAWI traces

Expected link capacity Gaussian Weibull Log-normal

𝑭𝒚𝒃𝒏𝒒𝒎𝒇: 𝛇 = 𝟏. 𝟏𝟐 Use case 1: Bandwidth provisioning

Performance criterion ε

𝑸 𝑩 𝑼 𝑼 ≥ 𝑫 ≤ 𝜻

slide-21
SLIDE 21

21

M T C W A 0.2 0.4 0.6

T=0.1s T=0.5s T= 1s

Target: ε = 0.5

Log-normal

M T C W A 0.2 0.4 0.6

T=0.1s T=0.5s T= 1s

M T C W A 0.2 0.4 0.6

T=0.1 s T=0.5 s T= 1 s

Target: ε = 0.5

Weibull Gaussian

Target: ε = 0.5

M: MAWI, T: Twente, C: CAIDA, W: Waikato, A: Auckland

Bandwidth provisioning: Results

0.5

slide-22
SLIDE 22

22

Use case 2: 95th percentile pricing

100 200 300

Time (sec)

200 400 600 800

Data rate (Mbps)

50 100

Time (sec)

200 400 600 800

Data rate (Mbps)

Burstable Billing § Customers are not billed for brief spikes in network traffic.

[5 minutes]

Percentile

slide-23
SLIDE 23

23

  • Log-normal model provides much more accurate predictions of the 95th percentile.

95th percentile pricing: Results

The red reference line to show where perfect predictions would be located.

500 1000 1500

Actual value (Mbps)

500 1000 1500

Predicted value (Mbps)

500 1000 1500

Actual value (Mbps)

500 1000 1500

Predicted value (Mbps)

500 1000 1500

Actual value (Mbps)

500 1000 1500

Predicted value (Mbps)

500 1000 1500

Actual value (Mbps)

500 1000 1500

Predicted value (Mbps)

MAWI traces

slide-24
SLIDE 24

24

Thanks! Questions?

More details ….

slide-25
SLIDE 25

25

Ø The distribution of traffic on Internet links is an important problem that has received relatively little attention. Ø We use a well-known, state-of-the-art statistical framework to investigate the problem using a large corpus

  • f traces.

Ø We investigated the distribution of the amount of traffic observed on a link in a given (small) aggregation period which we varied from 5 msec to 5 sec. Ø The vast majority of traces fitted the lognormal assumption best and this remained true all timescales tried. Ø We investigate the impact of the distribution on two sample traffic engineering problems. 1. Firstly, we looked at predicting the proportion of time a link will exceed a given capacity. 2. Secondly, we looked at predicting the 95th percentile transit bill that ISP might be given. Ø For both of these problems the log-normal distribution gave a more accurate result than heavy-tailed distribution or a Gaussian distribution.

Summary

slide-26
SLIDE 26

26

Backup ……

slide-27
SLIDE 27

Estimating: (𝛽 , xmin , ntail ) using MLE & KS test Uncertainty in the fitted parameters (Bootstrapping) Goodness-of-fit p-value

ℛ < 0 Alternative is favoured p > 0.1 p < 0.1

Ho: Power-law is favoured

fail to reject Ho reject Ho

ℛ > 0 None is favoured p-value for ℛ Power-law is favoured None is favoured p > 0.1 p < 0.1 p > 0.1 p < 0.1

ℛ > 0 ℛ < 0 None is favoured p-value for ℛ Alternative is favoured None is favoured p > 0.1 p < 0.1

1 2 3 4 5

p-value for ℛ

Power-law Test

Power-law distribution:

𝑞 𝑦 =

10q

xmin

X

xmin

01

Power-law test

Log-Likelihood ratio (ℛ)

[Ref] A. Clauset, C. S. Rohilla, and M. Newman, “Power-law distributions in empirical data,” arXiv:0706.1062v2, 2009.

slide-28
SLIDE 28

28

50 100 150

Actual value (Mbps)

50 100 150 200 250

Predicted value (Mbps)

Log-normal Weibull Gaussian Auckland traces

2 3 4 5

Actual value (Gbps)

2 3 4 5 6

Predicted value (Gbps)

Log-normal Weibull Gaussian 10 20 30

Actual value (Mbps)

10 20 30 40 50

Predicted value (Mbps)

Log-normal Weibull Gaussian Twente traces

50 100

Actual value (Mbps)

50 100

Predicted value (Mbps)

Log-normal Weibull Gaussian Waikato traces CAIDA traces

slide-29
SLIDE 29

29

Log-normal

Weibull

Meent