Structural Analysis of Network Traffic Flows Eric Kolaczyk Anukool - - PowerPoint PPT Presentation
Structural Analysis of Network Traffic Flows Eric Kolaczyk Anukool - - PowerPoint PPT Presentation
Structural Analysis of Network Traffic Flows Eric Kolaczyk Anukool Lakhina, Dina Papagiannaki, Mark Crovella, Christophe Diot, and Nina Taft Traditional Network What ISPs Care Traffic Analysis About Focus on Focus on Long,
2
Traditional Network Traffic Analysis What ISPs Care About
- Focus on
– Long, nonstationary timescales – Traffic on all links simultaneously
- Principal goals
– Capacity planning – Traffic engineering – Anomaly detection
- Focus on
– Short ‘stationary’ timescales – Traffic on a single link in isolation
- Principal results
– Scaling properties – Packet delays and losses
3
Need for Whole-Network Traffic Analysis
- Traffic Engineering: How
does traffic move throughout the network?
- Anomaly Detection: Which
links show unusual traffic?
- Capacity planning: How much
and where in network to upgrade?
4
This is Complicated!
- Measuring and modeling traffic on all links
simultaneously is challenging.
– Even single link modeling is difficult – 100s of links in large IP networks – High-Dimensional timeseries
- Significant correlation in link traffic
- Is there a more fundamental representation?
5
Origin-Destination Flows
total traffic on the link traffic time
- Link traffic arises from the superposition of Origin-
Destination (OD) flows
- Modeling OD flows instead of link traffic removes a
significant source of correlation
- A fundamental primitive for whole-network analysis
6
But, This Is Still Complicated
- Even more OD flows than links
- Still a high dimensional, multivariate
timeseries
- How do we extract meaning from this
high dimensional structure in a systematic manner?
7
High Dimensionality: A General Strategy
- Look for good low-dimensional
representations
- Often a high-dimensional structure can
be explained by a small number of independent variables
- A commonly used technique:
Principal Component Analysis (PCA)
(aka KL-Transform, SVD, …)
8
Our work
- Measure complete sets of OD flow
timeseries from two backbone networks
- Use PCA to understand their structure
– Decompose OD flows into simpler features – Characterize individual features – Reconstruct OD flows as sum of features
- Call this structural analysis
9
Datasets
- Abilene: 11 PoPs, 121 OD flows.
- Sprint-Europe: 13 PoPs, 169 OD flows.
- Collect sampled traffic from every ingress link using NetFlow
- Use BGP tables to resolve egress points
- Week-long datasets, 5- or 10-minute timesteps
10
Example OD Flows
Mon Tue Wed Thu Fri Sat Sun 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 4 Traffic in OD Flow 167 Mon Tue Wed Thu Fri Sat Sun 0.5 1 1.5 2 2.5 x 10 7 Traffic in Ab. OD Flow 29 Mon Tue Wed Thu Fri Sat Sun 3 3.5 4 4.5 5 5.5 x 10 7 Traffic in OD Flow 96 Mon Tue Wed Thu Fri Sat Sun 1 2 3 4 5 6 x 10 5 Traffic in OD Flow 124Some have visible structure, some less so…
Mon Tue Wed Thu Fri Sat Sun 1 1.5 2 2.5 3 x 10 7 Traffic in Ab. OD Flow 27 Mon Tue Wed Thu Fri Sat Sun 1 2 3 4 5 6 7 8 x 10 6 Traffic in Ab. OD Flow 59 Mon Tue Wed Thu Fri Sat Sun 0.5 1 1.5 2 2.5 3 x 10 6 Traffic in OD Flow 18 100 200 300 400 500 600 700 800 900 1000 1 2 3 4 5 6 7 x 10 4 Traffic in OD Flow 111 Mon Tue Wed Thu Fri Sat Sun 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 x 10 8 Traffic in OD Flow 84 Mon Tue Wed Thu Fri Sat Sun 0.5 1 1.5 2 2.5 3 3.5 x 10 4 Traffic in OD Flow 157 100 200 300 400 500 600 700 800 900 1000 1 2 3 4 5 x 10 5 Traffic in OD Flow 42 Mon Tue Wed Thu Fri Sat Sun 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 x 10 7 Traffic in OD Flow 13111
Specific Questions of Structural Analysis
- Are there low dimensional representations
for a set of OD flows?
- Do OD flows share common features?
- What do the features look like?
- Can we get a high-level understanding of a
set of OD flows in terms of these features?
12
Principal Component Analysis
Coordinate transformation method
Original Data Transformed Data
x1 , x2 u1 , u2
13
Properties of Principle Components
- Each PC in the direction of maximum (remaining)
energy in the set of OD flows
- Ordered by amount of energy they capture
- Eigenflow: set of OD flows mapped onto a PC;
a common trend
- Ordered by most common to least common
trend
14
PCA on OD flows
# OD pairs
OD flow
# OD pairs # OD pairs
PC
# OD pairs time time
Eigenflow
U: Eigenflow
matrix
V: Principal
matrix
X: OD flow
matrix
X=UΣVT
15
PCA on OD flows (2)
Each eigenflow is a weighted sum of all OD flows Eigenflows are orthonormal
=
Singular values indicate the energy attributable to a principal component ; Each OD flow is weighted sum of all eigenflows
= + +
16
An Example Eigenflow and PC
Mon Tue Wed Thu Fri Sat Sun −0.05 0.05
Time Eigenflow 6
20 40 60 80 100 120 140 160 −0.4 −0.2 0.2
OD Flow PC−6
Mon Tue Wed Thu Fri Sat Sun 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 4Traffic in OD Flow 167
Mon Tue Wed Thu Fri Sat Sun 3 3.5 4 4.5 5 5.5 x 10 7Traffic in OD Flow 96
OD Flow 167 OD Flow 94
17
Outline For Rest of Talk
- Find intrinsic dimensionality of OD flows
- Decompose OD flows
- Characterize eigenflows
- Reconstruct OD flows
- Potential applications
Structural Analysis
18
Low Intrinsic Dimensionality
- f OD Flows
20 40 60 80 100 120 140 160 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Singular Values Magnitude Sprint−1 Abilene
Plot of (square root
- f) energy captured
by each dimension.
Magnitude Singular Values
19
Approximating With Top 5 Eigenflows
Mon Tue Wed Thu Fri Sat Sun 1.5 2 2.5 3 3.5 x 10
7
Traffic in OD Flow 88
Original 5 PC
20
Approximating With Top 5 Eigenflows
Mon Tue Wed Thu Fri Sat Sun 0.5 1 1.5 2 2.5 x 10
7
Traffic in OD Flow 79
Original 5 PC
21
Approximating With Top 5 Eigenflows
Mon Tue Wed Thu Fri Sat Sun 1 2 3 4 5 6 7 x 10
7
Traffic in OD Flow 96 Original 5 PC
22
Outline
- Find intrinsic dimensionality of OD flows
- Decompose OD flows
- Characterize eigenflows
- Reconstruct OD flows
- Potential applications
Structural Analysis
23
Structure of OD Flows
20 40 60 80 100 120 140 160 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Number of Eigenflows in an OD flow Pr[X<x]
Sprint−1 Abilene
Most OD flows have less than 20 significant eigenflows Can think of each OD flow as having only a small set of “features”
24
Kinds of Eigenflows
Mon Tue Wed Thu Fri Sat Sun −0.1 −0.08 −0.06 −0.04 −0.02 0.02 0.04 0.06 0.08
Eigenflow 29
Mon Tue Wed Thu Fri Sat Sun −0.05 0.05 0.1 0.15 0.2 0.25 0.3 0.35
Eigenflow 20
Mon Tue Wed Thu Fri Sat Sun −0.06 −0.04 −0.02 0.02 0.04
Eigenflow 2
Spike s-eigenflows Noise n-eigenflows Deterministic d-eigenflows Roughly stationary and Gaussian Sudden, isolated spikes and drops Predictable (periodic) trends
25
D-eigenflows Have Periodicity
Mon Tue Wed Thu Fri Sat Sun 0.022 0.024 0.026 0.028 0.03 0.032 0.034 0.036 0.038
Eigenflow 1
6 12 24 36 48 0.5 1 1.5 2 2.5 3 3.5
Hours FFT Energy
Sprint−1 Abilene
Power spectrum
26
S-eigenflows Have Spikes
Mon Tue Wed Thu Fri Sat Sun 0.1 0.2 0.3
Sprint−1 Eigenflow 8
Mon Tue Wed Thu Fri Sat Sun −0.1 −0.05 0.05
Abilene Eigenflow 10
5-sigma threshold
27
N-eigenflows Are Gaussian
Mon Tue Wed Thu Fri Sat Sun −0.1 −0.05 0.05 0.1
Eigenflow 39
−3 −2 −1 1 2 3 −0.1 −0.08 −0.06 −0.04 −0.02 0.02 0.04 0.06 0.08 0.1
Standard Normal Quantiles Quantiles of Input Sample
Sprint−1 Abilene
qq-plot
28
Hundreds of Eigenflows But Only Three Basic Types
29
1 1.5 2 x 10
7
Original 0.6 0.8 1 1.2 1.4 1.6 1.8 x 10
7
d−eigenflows −5 5 x 10
6
s−eigenflows
Mon Tue Wed Thu Fri Sat Sun −5 5 x 10
6
n−eigenflows
An OD Flow, Reconstructed
OD flow D-components S-components N-components
30
2 4 6 x 10
7
Original 2 4 x 10
7
d−eigenflows −2 2 x 10
7
s−eigenflows
Mon Tue Wed Thu Fri Sat Sun −2 2 x 10
7
n−eigenflows
Another OD Flow, Reconstructed
OD flow D-components S-components N-components
31
Which Eigenflows Are Most Significant?
5 10 15 20 25 30 35 40 45 50 d−eigenflow s−eigenflow n−eigenflow
Sprint Eigenflows in order
10 20 30 40 50 60 70 80 90 d−eigenflow s−eigenflow n−eigenflow
Abilene Eigenflows in order
N S D N S D 1-6: d-eigenflows appear to be most significant in both networks. 5-10: s-eigenflows are next important. 12 and beyond: n-eigenflows account for rest.
32
Contribution of Eigenflow Types
Fraction of total OD flow energy captured by each type of eigenflow
33
Contribution to Each OD Flow (Sprint)
(Sprint)
15 30 45 60 75 90 105 120 135 150 165 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Fraction of Total Energy OD Flow (large to small)
Deterministic Spike Noise
Largest OD flows: Strong deterministic component. Smallest OD flows: Primarily dominated by spikes. Regardless of size, n-eigenflows account for a fairly constant portion.
34
Contribution to Each OD Flow (Abilene)
15 30 45 60 75 90 105 120 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Fraction of Total Energy OD Flow (large to small)
Deterministic Spike Noise
Largest OD flows: Strong deterministic component. Smallest OD flows: Dominated by noise, but have diurnal trends also. Regardless of size, spikes account for a fairly constant portion.
35
Summary: Specific Questions
- Are there low dimensional representations for a set
- f OD flows?
– 5-10 eigenflows is sufficient for good approximation of a set of 100+ OD flows
- Do OD flows share common features?
– The common features across OD flows are eigenflows
- What do the features look like?
– Each eigenflow can be categorized as D, S, or N
- Can we get a high-level understanding of a set of
OD flows in terms of these features?
– Both networks: Large flows are primarily diurnal – Sprint: Small flows are primarily spikes; noise constant. – Abilene: Small flows have N and D; spikes constant.
36
Outline
- Find intrinsic dimensionality of OD flows
- Decompose OD flows
- Characterize eigenflows
- Reconstruct OD flows
- Potential applications
Structural Analysis
37
Traffic Matrix Estimation
Problem Statement: Infer OD flows (X) given link measurements (Y) and routing matrix (A): YT = AXT State of the Art: dim(X) > dim(Y), so treat as ill-posed linear inverse problem. Infer Y on stationary (short) timescales. Possible Approach: On longer timescales, intrinsic dimensionality of OD flows is small, so effective dim(X) < dim(Y) TM estimation of largest eigenflows now becomes a “well- posed” problem.
38
Anomaly Detection
State of the art: Use wavelets to detrend each flow in isolation.
[Barford:IMW02]
Possible approach: Detrend all OD flows simultaneously by subtracting d-eigenflows.
Mon Tue Wed Thu Fri Sat Sun 1 1.5 2 2.5 3 3.5 x 10
7
Original Timeseries for OD Flow# 57 Original timeseries d−pseudo−eigenflows Mon Tue Wed Thu Fri Sat Sun −5 5 10 15 x 10
6
Detrended Timeseries with 4σ threshold
39
Traffic Forecasting
State of the art: Treat each flow timeseries independently. Use wavelets to extract trends. Build timeseries forecasting models on trends.
[PTZC:INFOCOM03]
Possible approach: Build forecasting models on d-eigenflows as trends. Allows simultaneous examination and forecasting for entire ensemble of OD flows.
40
Traffic Engineering
Problem Statement: How does one identify important traffic flows, so that they can be treated differently? State of the art: Measure all flows on a single link Find “heavy-hitters” or “elephant” flows based on preset thresholds [PTC:INFOCOM04, PTBTSC:IMW02] Possible approach: Look across all flows and extract common features Taxonomize each flow into D, S, or N
41
Final thoughts
OD flows a useful primitive to engineer networks Set of OD flows have low dimensional representations A Structural Analysis approach can provide useful insight into nature of OD flows
42
Thanks!
- Help with Abilene Data
- Rick Summerhill, Mark Fullmer (Internet2)
- Matthew Davy (Indiana University)
- Help with Sprint-Europe Data
- Bjorn Carlsson, Jeff Loughridge (SprintLink),
- Richard Gass (Sprint ATL)
43
Backup slides
44
Principal Component Analysis
For any given dataset, PCA finds a new coordinate system that maps maximum variability in the data to a minimum number of coordinates New axes are called Principal Axes or Components
45
Properties of Principle Components
- Each PC in the direction of maximum (remaining) energy in
the set of OD flows
- Ordered by amount of energy they capture
- Eigenflow: set of OD flows mapped onto a PC; a
common trend
- Ordered by most common to least common trend
and,
difference between original and data mapped onto first k-1 PCs. mapping of X onto one PC
46
Energy captured by each PC
20 40 60 80 100 120 140 160 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Principal Component Energy Captured
Sprint−1 Abilene