Structural Analysis of Network Traffic Flows Eric Kolaczyk Anukool - - PowerPoint PPT Presentation

structural analysis of network traffic flows
SMART_READER_LITE
LIVE PREVIEW

Structural Analysis of Network Traffic Flows Eric Kolaczyk Anukool - - PowerPoint PPT Presentation

Structural Analysis of Network Traffic Flows Eric Kolaczyk Anukool Lakhina, Dina Papagiannaki, Mark Crovella, Christophe Diot, and Nina Taft Traditional Network What ISPs Care Traffic Analysis About Focus on Focus on Long,


slide-1
SLIDE 1

Structural Analysis of Network Traffic Flows

Eric Kolaczyk Anukool Lakhina, Dina Papagiannaki, Mark Crovella, Christophe Diot, and Nina Taft

slide-2
SLIDE 2

2

Traditional Network Traffic Analysis What ISPs Care About

  • Focus on

– Long, nonstationary timescales – Traffic on all links simultaneously

  • Principal goals

– Capacity planning – Traffic engineering – Anomaly detection

  • Focus on

– Short ‘stationary’ timescales – Traffic on a single link in isolation

  • Principal results

– Scaling properties – Packet delays and losses

slide-3
SLIDE 3

3

Need for Whole-Network Traffic Analysis

  • Traffic Engineering: How

does traffic move throughout the network?

  • Anomaly Detection: Which

links show unusual traffic?

  • Capacity planning: How much

and where in network to upgrade?

slide-4
SLIDE 4

4

This is Complicated!

  • Measuring and modeling traffic on all links

simultaneously is challenging.

– Even single link modeling is difficult – 100s of links in large IP networks – High-Dimensional timeseries

  • Significant correlation in link traffic
  • Is there a more fundamental representation?
slide-5
SLIDE 5

5

Origin-Destination Flows

total traffic on the link traffic time

  • Link traffic arises from the superposition of Origin-

Destination (OD) flows

  • Modeling OD flows instead of link traffic removes a

significant source of correlation

  • A fundamental primitive for whole-network analysis
slide-6
SLIDE 6

6

But, This Is Still Complicated

  • Even more OD flows than links
  • Still a high dimensional, multivariate

timeseries

  • How do we extract meaning from this

high dimensional structure in a systematic manner?

slide-7
SLIDE 7

7

High Dimensionality: A General Strategy

  • Look for good low-dimensional

representations

  • Often a high-dimensional structure can

be explained by a small number of independent variables

  • A commonly used technique:

Principal Component Analysis (PCA)

(aka KL-Transform, SVD, …)

slide-8
SLIDE 8

8

Our work

  • Measure complete sets of OD flow

timeseries from two backbone networks

  • Use PCA to understand their structure

– Decompose OD flows into simpler features – Characterize individual features – Reconstruct OD flows as sum of features

  • Call this structural analysis
slide-9
SLIDE 9

9

Datasets

  • Abilene: 11 PoPs, 121 OD flows.
  • Sprint-Europe: 13 PoPs, 169 OD flows.
  • Collect sampled traffic from every ingress link using NetFlow
  • Use BGP tables to resolve egress points
  • Week-long datasets, 5- or 10-minute timesteps
slide-10
SLIDE 10

10

Example OD Flows

Mon Tue Wed Thu Fri Sat Sun 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 4 Traffic in OD Flow 167 Mon Tue Wed Thu Fri Sat Sun 0.5 1 1.5 2 2.5 x 10 7 Traffic in Ab. OD Flow 29 Mon Tue Wed Thu Fri Sat Sun 3 3.5 4 4.5 5 5.5 x 10 7 Traffic in OD Flow 96 Mon Tue Wed Thu Fri Sat Sun 1 2 3 4 5 6 x 10 5 Traffic in OD Flow 124

Some have visible structure, some less so…

Mon Tue Wed Thu Fri Sat Sun 1 1.5 2 2.5 3 x 10 7 Traffic in Ab. OD Flow 27 Mon Tue Wed Thu Fri Sat Sun 1 2 3 4 5 6 7 8 x 10 6 Traffic in Ab. OD Flow 59 Mon Tue Wed Thu Fri Sat Sun 0.5 1 1.5 2 2.5 3 x 10 6 Traffic in OD Flow 18 100 200 300 400 500 600 700 800 900 1000 1 2 3 4 5 6 7 x 10 4 Traffic in OD Flow 111 Mon Tue Wed Thu Fri Sat Sun 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 x 10 8 Traffic in OD Flow 84 Mon Tue Wed Thu Fri Sat Sun 0.5 1 1.5 2 2.5 3 3.5 x 10 4 Traffic in OD Flow 157 100 200 300 400 500 600 700 800 900 1000 1 2 3 4 5 x 10 5 Traffic in OD Flow 42 Mon Tue Wed Thu Fri Sat Sun 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 x 10 7 Traffic in OD Flow 131
slide-11
SLIDE 11

11

Specific Questions of Structural Analysis

  • Are there low dimensional representations

for a set of OD flows?

  • Do OD flows share common features?
  • What do the features look like?
  • Can we get a high-level understanding of a

set of OD flows in terms of these features?

slide-12
SLIDE 12

12

Principal Component Analysis

Coordinate transformation method

Original Data Transformed Data

x1 , x2 u1 , u2

slide-13
SLIDE 13

13

Properties of Principle Components

  • Each PC in the direction of maximum (remaining)

energy in the set of OD flows

  • Ordered by amount of energy they capture
  • Eigenflow: set of OD flows mapped onto a PC;

a common trend

  • Ordered by most common to least common

trend

slide-14
SLIDE 14

14

PCA on OD flows

# OD pairs

OD flow

# OD pairs # OD pairs

PC

# OD pairs time time

Eigenflow

U: Eigenflow

matrix

V: Principal

matrix

X: OD flow

matrix

X=UΣVT

slide-15
SLIDE 15

15

PCA on OD flows (2)

Each eigenflow is a weighted sum of all OD flows Eigenflows are orthonormal

=

Singular values indicate the energy attributable to a principal component ; Each OD flow is weighted sum of all eigenflows

= + +

slide-16
SLIDE 16

16

An Example Eigenflow and PC

Mon Tue Wed Thu Fri Sat Sun −0.05 0.05

Time Eigenflow 6

20 40 60 80 100 120 140 160 −0.4 −0.2 0.2

OD Flow PC−6

Mon Tue Wed Thu Fri Sat Sun 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 4

Traffic in OD Flow 167

Mon Tue Wed Thu Fri Sat Sun 3 3.5 4 4.5 5 5.5 x 10 7

Traffic in OD Flow 96

OD Flow 167 OD Flow 94

slide-17
SLIDE 17

17

Outline For Rest of Talk

  • Find intrinsic dimensionality of OD flows
  • Decompose OD flows
  • Characterize eigenflows
  • Reconstruct OD flows
  • Potential applications

Structural Analysis

slide-18
SLIDE 18

18

Low Intrinsic Dimensionality

  • f OD Flows

20 40 60 80 100 120 140 160 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Singular Values Magnitude Sprint−1 Abilene

Plot of (square root

  • f) energy captured

by each dimension.

Magnitude Singular Values

slide-19
SLIDE 19

19

Approximating With Top 5 Eigenflows

Mon Tue Wed Thu Fri Sat Sun 1.5 2 2.5 3 3.5 x 10

7

Traffic in OD Flow 88

Original 5 PC

slide-20
SLIDE 20

20

Approximating With Top 5 Eigenflows

Mon Tue Wed Thu Fri Sat Sun 0.5 1 1.5 2 2.5 x 10

7

Traffic in OD Flow 79

Original 5 PC

slide-21
SLIDE 21

21

Approximating With Top 5 Eigenflows

Mon Tue Wed Thu Fri Sat Sun 1 2 3 4 5 6 7 x 10

7

Traffic in OD Flow 96 Original 5 PC

slide-22
SLIDE 22

22

Outline

  • Find intrinsic dimensionality of OD flows
  • Decompose OD flows
  • Characterize eigenflows
  • Reconstruct OD flows
  • Potential applications

Structural Analysis

slide-23
SLIDE 23

23

Structure of OD Flows

20 40 60 80 100 120 140 160 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Number of Eigenflows in an OD flow Pr[X<x]

Sprint−1 Abilene

Most OD flows have less than 20 significant eigenflows Can think of each OD flow as having only a small set of “features”

slide-24
SLIDE 24

24

Kinds of Eigenflows

Mon Tue Wed Thu Fri Sat Sun −0.1 −0.08 −0.06 −0.04 −0.02 0.02 0.04 0.06 0.08

Eigenflow 29

Mon Tue Wed Thu Fri Sat Sun −0.05 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Eigenflow 20

Mon Tue Wed Thu Fri Sat Sun −0.06 −0.04 −0.02 0.02 0.04

Eigenflow 2

Spike s-eigenflows Noise n-eigenflows Deterministic d-eigenflows Roughly stationary and Gaussian Sudden, isolated spikes and drops Predictable (periodic) trends

slide-25
SLIDE 25

25

D-eigenflows Have Periodicity

Mon Tue Wed Thu Fri Sat Sun 0.022 0.024 0.026 0.028 0.03 0.032 0.034 0.036 0.038

Eigenflow 1

6 12 24 36 48 0.5 1 1.5 2 2.5 3 3.5

Hours FFT Energy

Sprint−1 Abilene

Power spectrum

slide-26
SLIDE 26

26

S-eigenflows Have Spikes

Mon Tue Wed Thu Fri Sat Sun 0.1 0.2 0.3

Sprint−1 Eigenflow 8

Mon Tue Wed Thu Fri Sat Sun −0.1 −0.05 0.05

Abilene Eigenflow 10

5-sigma threshold

slide-27
SLIDE 27

27

N-eigenflows Are Gaussian

Mon Tue Wed Thu Fri Sat Sun −0.1 −0.05 0.05 0.1

Eigenflow 39

−3 −2 −1 1 2 3 −0.1 −0.08 −0.06 −0.04 −0.02 0.02 0.04 0.06 0.08 0.1

Standard Normal Quantiles Quantiles of Input Sample

Sprint−1 Abilene

qq-plot

slide-28
SLIDE 28

28

Hundreds of Eigenflows But Only Three Basic Types

slide-29
SLIDE 29

29

1 1.5 2 x 10

7

Original 0.6 0.8 1 1.2 1.4 1.6 1.8 x 10

7

d−eigenflows −5 5 x 10

6

s−eigenflows

Mon Tue Wed Thu Fri Sat Sun −5 5 x 10

6

n−eigenflows

An OD Flow, Reconstructed

OD flow D-components S-components N-components

slide-30
SLIDE 30

30

2 4 6 x 10

7

Original 2 4 x 10

7

d−eigenflows −2 2 x 10

7

s−eigenflows

Mon Tue Wed Thu Fri Sat Sun −2 2 x 10

7

n−eigenflows

Another OD Flow, Reconstructed

OD flow D-components S-components N-components

slide-31
SLIDE 31

31

Which Eigenflows Are Most Significant?

5 10 15 20 25 30 35 40 45 50 d−eigenflow s−eigenflow n−eigenflow

Sprint Eigenflows in order

10 20 30 40 50 60 70 80 90 d−eigenflow s−eigenflow n−eigenflow

Abilene Eigenflows in order

N S D N S D 1-6: d-eigenflows appear to be most significant in both networks. 5-10: s-eigenflows are next important. 12 and beyond: n-eigenflows account for rest.

slide-32
SLIDE 32

32

Contribution of Eigenflow Types

Fraction of total OD flow energy captured by each type of eigenflow

slide-33
SLIDE 33

33

Contribution to Each OD Flow (Sprint)

(Sprint)

15 30 45 60 75 90 105 120 135 150 165 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Fraction of Total Energy OD Flow (large to small)

Deterministic Spike Noise

Largest OD flows: Strong deterministic component. Smallest OD flows: Primarily dominated by spikes. Regardless of size, n-eigenflows account for a fairly constant portion.

slide-34
SLIDE 34

34

Contribution to Each OD Flow (Abilene)

15 30 45 60 75 90 105 120 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Fraction of Total Energy OD Flow (large to small)

Deterministic Spike Noise

Largest OD flows: Strong deterministic component. Smallest OD flows: Dominated by noise, but have diurnal trends also. Regardless of size, spikes account for a fairly constant portion.

slide-35
SLIDE 35

35

Summary: Specific Questions

  • Are there low dimensional representations for a set
  • f OD flows?

– 5-10 eigenflows is sufficient for good approximation of a set of 100+ OD flows

  • Do OD flows share common features?

– The common features across OD flows are eigenflows

  • What do the features look like?

– Each eigenflow can be categorized as D, S, or N

  • Can we get a high-level understanding of a set of

OD flows in terms of these features?

– Both networks: Large flows are primarily diurnal – Sprint: Small flows are primarily spikes; noise constant. – Abilene: Small flows have N and D; spikes constant.

slide-36
SLIDE 36

36

Outline

  • Find intrinsic dimensionality of OD flows
  • Decompose OD flows
  • Characterize eigenflows
  • Reconstruct OD flows
  • Potential applications

Structural Analysis

slide-37
SLIDE 37

37

Traffic Matrix Estimation

Problem Statement: Infer OD flows (X) given link measurements (Y) and routing matrix (A): YT = AXT State of the Art: dim(X) > dim(Y), so treat as ill-posed linear inverse problem. Infer Y on stationary (short) timescales. Possible Approach: On longer timescales, intrinsic dimensionality of OD flows is small, so effective dim(X) < dim(Y) TM estimation of largest eigenflows now becomes a “well- posed” problem.

slide-38
SLIDE 38

38

Anomaly Detection

State of the art: Use wavelets to detrend each flow in isolation.

[Barford:IMW02]

Possible approach: Detrend all OD flows simultaneously by subtracting d-eigenflows.

Mon Tue Wed Thu Fri Sat Sun 1 1.5 2 2.5 3 3.5 x 10

7

Original Timeseries for OD Flow# 57 Original timeseries d−pseudo−eigenflows Mon Tue Wed Thu Fri Sat Sun −5 5 10 15 x 10

6

Detrended Timeseries with 4σ threshold

slide-39
SLIDE 39

39

Traffic Forecasting

State of the art: Treat each flow timeseries independently. Use wavelets to extract trends. Build timeseries forecasting models on trends.

[PTZC:INFOCOM03]

Possible approach: Build forecasting models on d-eigenflows as trends. Allows simultaneous examination and forecasting for entire ensemble of OD flows.

slide-40
SLIDE 40

40

Traffic Engineering

Problem Statement: How does one identify important traffic flows, so that they can be treated differently? State of the art: Measure all flows on a single link Find “heavy-hitters” or “elephant” flows based on preset thresholds [PTC:INFOCOM04, PTBTSC:IMW02] Possible approach: Look across all flows and extract common features Taxonomize each flow into D, S, or N

slide-41
SLIDE 41

41

Final thoughts

OD flows a useful primitive to engineer networks Set of OD flows have low dimensional representations A Structural Analysis approach can provide useful insight into nature of OD flows

slide-42
SLIDE 42

42

Thanks!

  • Help with Abilene Data
  • Rick Summerhill, Mark Fullmer (Internet2)
  • Matthew Davy (Indiana University)
  • Help with Sprint-Europe Data
  • Bjorn Carlsson, Jeff Loughridge (SprintLink),
  • Richard Gass (Sprint ATL)
slide-43
SLIDE 43

43

Backup slides

slide-44
SLIDE 44

44

Principal Component Analysis

For any given dataset, PCA finds a new coordinate system that maps maximum variability in the data to a minimum number of coordinates New axes are called Principal Axes or Components

slide-45
SLIDE 45

45

Properties of Principle Components

  • Each PC in the direction of maximum (remaining) energy in

the set of OD flows

  • Ordered by amount of energy they capture
  • Eigenflow: set of OD flows mapped onto a PC; a

common trend

  • Ordered by most common to least common trend

and,

difference between original and data mapped onto first k-1 PCs. mapping of X onto one PC

slide-46
SLIDE 46

46

Energy captured by each PC

20 40 60 80 100 120 140 160 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Principal Component Energy Captured

Sprint−1 Abilene