Internet Traffic Analysis: Mohammed Alasmar Co Cosen ener ers, - PowerPoint PPT Presentation

‘19 Internet Traffic Analysis: Mohammed Alasmar Co Cosen ener ers, , 2019 2019 https://ieeexplore.ieee.org/document/8737483

Motivations § Reliable traffic modelling is important for network planning, deployment and management; e.g. (1) network dimensioning, (2) traffic billing. § Historically, network traffic has been widely assumed to follow a Gaussian distribution . § Deciding whether Internet flows could be heavy-tailed became important as this implies significant departures from Gaussianity . 2

Traffic volumes at different T 𝑌 " : the amount of traffic seen in the time period [𝑗𝑈, (𝑗 + 1)𝑈) § Internet trace.pacp § Aggregation at different sampling times (T)

Traffic volumes at different T 50 50 T = 10 msec Data rate (Mbps) Data rate (Mbps) T = 1 sec 40 40 T = 5 sec 30 30 20 20 10 10 0 0 0 0 300 300 600 600 900 900 Time (seconds) Time (seconds) 4

Goal Goal 0.1 T = 10 ms T = 1 sec T = 5 sec PDF 0.05 0 0 10 20 30 40 Data rate (Mbps) § Investigating the distribution of the amount of traffic per unit time using a robust statistical approach. 5

Goal § Investigating the distribution of the amount of traffic per unit time using a robust statistical approach. 6

Goal T = 10 ms T = 1 sec T = 5 sec 7

Datasets § We study a large number of traffic traces (230) from many 2009 à 2018 different networks: Dataset #Traces Twente 1 40 MAWI 2 107 Auckland 3 25 Waikato 4 30 Caida 5 27 [1] https://www.simpleweb.org/wiki/index.php/Traces , 2009. [2] http://mawi.wide.ad.jp/mawi/ , 2016-2018. [3] https://wand.net.nz/wits/auck/9/ , 2009. [4] https://wand.net.nz/wits/waikato/8/ , 2010-2011. 8 [5] http://www.caida.org/data/overview/ , 2016.

Power-law test § Our analysis is based on the framework proposed in: § The framework combines maximum-likelihood fitting methods with goodness-of-fit tests based on the K olmogorov– S mirnov statistic and likelihood ratios. 9

Power-law test Power-law distribution: o is 𝑞 𝑦 = 𝛽 − 1 𝑦 01 𝑦 01 𝑞 𝑦 = introduce new variables x min x min 3000 2500 2000 PDF 1500 𝛽 : scaling exponent 1000 500 0 0 20 40 60 80 100 120 x min 10 6 data rate (bps) 10

Power-law test 11

Likelihood Ratio: 𝑺 𝑺, 𝑞 = 𝑔𝑗𝑢. 𝒆𝒋𝒕𝒖𝒔𝒋𝒄𝒗𝒖𝒋𝒑𝒐𝑫𝒑𝒏𝒒𝒃𝒔𝒇(𝑞𝑝𝑥𝑓𝑠𝑚𝑏𝑥, 𝑏𝑚𝑢𝑓𝑠𝑜𝑏𝑢𝑗𝑤𝑓) • Weibull • Lognormal • Exponential Likelihood ratio : V power-law likelihood function ∏ TUQ W Q X P Q 𝑺 = P R = V ∏ TUQ W R (X) alternative likelihood function 𝑴 og −Likelihood ratio : 𝑺 § • If 𝑺 > 0 , then the power-law is favoured. • If 𝑺 < 0 , then the alternative is favoured. • If 𝑞 < 0.1 , then the value of 𝑺 can be trusted. 12

Normalised Log-Likelihood Ratio (LLR) T=100 msec (𝑺) Circled points p > 0.1 10 10 10 0 0 0 Normalised LLR Normalised LLR Normalised LLR -10 -10 -10 Weibull -20 -20 -20 Lognormal The lognormal is the -30 -30 -30 Weibull Exponential best fit for the vast majority -40 -40 -40 of traces. Lognormal -50 -50 -50 5 5 5 10 10 10 15 15 15 20 20 20 25 25 25 Rank of trace Rank of trace Rank of trace 13

The log-normal 10 30 Weibull Weibull MAWI traces Waikato traces Lognormal distribution is the 20 Lognormal Normalised LLR Normalised LLR Exponential Exponential 5 10 best fit for the vast 0 majority of traces. 0 -10 -20 The log-normal distribution is not -5 the best fit for … -30 60 70 80 90 100 5 10 15 20 25 30 o 1 out of 27 CAIDA traces Rank of trace Rank of trace o 9 out of 107 MAWI traces 20 10 Twente traces Auckland traces o 2 out of 30 Waikato traces 0 Normalised LLR Normalised LLR 0 o 5 out of 40 Twente traces -10 o 1 out of 25 Auckland traces -20 -20 -40 -30 Weibull Weibull Lognormal -40 Lognormal -60 Exponential Exponential -50 5 10 15 20 25 Anomalous traces 10 20 30 40 Rank of trace Rank of trace 14

Anomalous traces § Anomalous traces are a poor fit for all distributions tried. § This is often due to traffic outages or links that hit maximum capacity. 0.02 0.03 PDF PDF 0.015 Anomalous Log-normal 0.02 trace trace 0.01 0.01 0.005 0 0 0 500 1000 0 500 1000 Data rate (Mbps) Data rate (Mbps) 15

At different sampling times: T Normalised Log-Likelihood Ratio (LLR) test results for all studied traces and log-normal distribution at different timescales 𝑺 0 Normalised LLR 𝑺 < 0 , i.e., -5 log-normal -10 is favoured. T = 5 sec T = 1 sec -15 T = 100 msec CAIDA traces T = 5 msec -20 5 10 15 20 25 Rank of trace 16

The correlation coefficient test § Strong goodness-of-fit (GOF) is assumed to exist when the value of 𝛿 is greater than 0.95. Log-normal Gaussian 1 0.9 0.95 0.8 T=5sec T=5sec T=1sec T=1sec 0.7 T=100msec T=100msec CAIDA traces 0.9 T=5msec CAIDA traces T=5msec 5 10 15 20 25 5 10 15 20 25 Rank of Traces Rank of Traces 17

Use case 1: Bandwidth provisioning § Bandwidth provisioning approach provides the link by the essential bandwidth that guarantees the required performance. § Overprovisioning . In the conventional methods the bandwidth is allocated by up-grading the link bandwidth to 30% of the average traffic value. 18

Use case 1: Bandwidth provisioning § The following inequality (the ‘ link transparency formula ’) has been used for bandwidth provisioning: 𝑄 𝐵 𝑈 ≥ 𝐷 ≤ 𝜁 𝑈 i.e., the probability that the captured traffic A T over a specific aggregation timescale T is larger than the link capacity C has to be smaller than the value of a performance criterion ε . ü 𝛇 has to be chosen carefully by the network provider in order to meet the specified SLA. 19

Use case 1: Bandwidth provisioning 𝑭𝒚𝒃𝒏𝒒𝒎𝒇: 𝛇 = 𝟏. 𝟏𝟐 Expected link capacity 𝑸 𝑩 𝑼 Gaussian ≥ 𝑫 ≤ 𝜻 Weibull 𝑼 Log-normal MAWI traces Performance criterion ε

Bandwidth provisioning: Results 0.6 0.6 0.6 T=0.1s T=0.5s T= 1s T=0.1s T=0.5s T= 1s T=0.1 s T=0.5 s T= 1 s 0.5 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 M T C W A M T C W A M T C W A Target: ε = 0.5 Target: ε = 0.5 Target: ε = 0.5 Log-normal Weibull Gaussian M: MAWI, T: Twente, C: CAIDA, W: Waikato, A: Auckland 21

Burstable Billing Use case 2: 95th percentile pricing § Customers are not billed for brief spikes in network traffic. 800 800 Data rate (Mbps) Data rate (Mbps) 600 600 400 400 200 200 0 0 0 100 200 300 0 50 100 Time (sec) Percentile Time (sec) [5 minutes] 22

95th percentile pricing: Results Predicted value (Mbps) Predicted value (Mbps) Predicted value (Mbps) Predicted value (Mbps) 1500 1500 1500 1500 MAWI traces 1000 1000 1000 1000 The red reference line to show where perfect predictions 500 500 500 500 would be located. 0 0 0 0 0 0 0 0 500 500 500 500 1000 1000 1000 1000 1500 1500 1500 1500 Actual value (Mbps) Actual value (Mbps) Actual value (Mbps) Actual value (Mbps) Log-normal model provides much more accurate predictions of the 95th percentile. • 23

More details …. Thanks! Questions? 24

Summary The distribution of traffic on Internet links is an important problem that has received relatively little Ø attention. We use a well-known, state-of-the-art statistical framework to investigate the problem using a large corpus Ø of traces. We investigated the distribution of the amount of traffic observed on a link in a given (small) aggregation Ø period which we varied from 5 msec to 5 sec. The vast majority of traces fitted the lognormal assumption best and this remained true all timescales tried. Ø We investigate the impact of the distribution on two sample traffic engineering problems. Ø 1. Firstly, we looked at predicting the proportion of time a link will exceed a given capacity. 2. Secondly, we looked at predicting the 95th percentile transit bill that ISP might be given. For both of these problems the log-normal distribution gave a more accurate result than heavy-tailed Ø distribution or a Gaussian distribution. 25

Backup …… 26

Power-law Test Power-law test Estimating: ( 𝛽 , x min , n tail ) Power-law distribution: 1 using MLE & KS test 01 10q X Uncertainty in the fitted 𝑞 𝑦 = 2 parameters (Bootstrapping) x min x min Goodness-of-fit p < 0.1 p > 0.1 3 p -value fail to reject Ho reject Ho Ho: Power-law is favoured ℛ > 0 ℛ < 0 ℛ < 0 ℛ > 0 ℛ ℛ 4 p > 0.1 p < 0.1 p > 0.1 p < 0.1 p < 0.1 p -value p -value p > 0.1 p -value 5 for ℛ for ℛ for ℛ None is Alternative None is Alternative Power-law None is None is favoured is favoured favoured is favoured is favoured favoured favoured [Ref] A. Clauset, C. S. Rohilla, and M. Newman, “Power-law Log-Likelihood ratio ( ℛ) distributions in empirical data,” arXiv:0706.1062v2, 2009.

Internet Traffic Analysis: Mohammed Alasmar Co Cosen ener ers, - PowerPoint PPT Presentation

19 Internet Traffic Analysis: Mohammed Alasmar Co Cosen ener ers, , 2019 2019 https://ieeexplore.ieee.org/document/8737483 Motivations Reliable traffic modelling is important for network planning, deployment and management;

using Traffic Analysis Attacks Salini S K What is Traffic Analysis What is Traffic Analysis

Traffic Shaping, Traffic Policing Peter Puschner, Institut fr Technische Informatik Traffic

Traffic signal optimization and traffic assignment Traffic signals Traffic signal optimization

The Traffic Conflicts Methodology revisited Richard van der Horst Traffic Safety Assessment

Traffic Engineering with Traffic Engineering with Estimated Traffic Matrices Estimated Traffic

VoIP/SMPP traffic sniffer Break through your data Traffic sniffer modules VoIP traffic sniffer

Pinson and Arkansas Blvd. Traffic Count Legend Traffic Count Map Pinson and Ark Blvd Ordinance:

Broward County Traffic Engineering Programs Broward County Traffic Engineering Programs

Traffic Flow Models CIVL 4162/6162 (Traffic Engineering) Lesson Objective Demonstrate

Internet traffic measurements Renata Teixeira (Inria) Why measure traffic? Performance

Internet Traffic: Analysis, Modeling with real-world aspects Pierre B ORGNAT CNRS ENS Lyon,

Traffic Classification in the Fog Scott E. Coull February 23, 2006 Overview What is traffic

Internet Governance Forum the World Malta Internet Traffic Statistics Introduction

Interactive traffic analysis and Interactive traffic analysis and visualization with Wisconsin

DISTRIBUTION Describing traffic How can we describe traffic? Cars, pedestrian, internet,

Traffic Safety Resources Overview Traffic Safety Unit Structure Traffic Engineering

Classification Fundamentals and Overview September 17, 2019 Classification Fundamentals

Test for Covariances Max Turgeon STAT 7200Multivariate Statistics Objectives Review

Chapter 8: Hypothesis Testing STK4011/9011: Statistical Inference Theory Johan Pensar

Bayesian hypothesis testing (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University March 7,

Approximating likelihood ratios with calibrated classifiers Gilles Louppe June 22, 2016 MLHEP,

Weighted Classification Cascades for Optimizing Discovery Significance Lester Mackey

Batch Steganography and Pooled Steganalysis Andrew Ker adk@comlab.ox.ac.uk Royal Society

EC3062 ECONOMETRICS LIMITED DEPENDENT VARIABLES Logistic Trends One way of modelling a process

Sambuz

Useful Links

Newsletter

Mail Us