Inverting Sampled Traffic Nicolas Hohn, Darryl Veitch Australian - - PowerPoint PPT Presentation

inverting sampled traffic
SMART_READER_LITE
LIVE PREVIEW

Inverting Sampled Traffic Nicolas Hohn, Darryl Veitch Australian - - PowerPoint PPT Presentation

Inverting Sampled Traffic Nicolas Hohn, Darryl Veitch Australian Research Council Special Research Center for Ultra-Broadband Information Networks T HE U NIVERSITY OF M ELBOURNE Inverting Sampled Traffic Motivation Sampling Techniques


slide-1
SLIDE 1

Inverting Sampled Traffic

Nicolas Hohn, Darryl Veitch

Australian Research Council Special Research Center for Ultra-Broadband Information Networks THE UNIVERSITY OF MELBOURNE

slide-2
SLIDE 2

Inverting Sampled Traffic

Motivation Sampling Techniques – Packet Sampling – Flow Sampling Comparison of sampling techniques – Distribution of the number of packets per flows – Spectral density of packet arrival process Application to traffic modelling

slide-3
SLIDE 3

Introduction

Motivation

Traffic statistics collected by routers don’t scale well with link speed: exact traffic logging is impossible for backbone links Need to sample the traffic, export partial statistics Aim: infer statistics of original traffic from partial measurements

slide-4
SLIDE 4

Introduction

Motivation

Traffic statistics collected by routers don’t scale well with link speed: exact traffic logging is impossible for backbone links Need to sample the traffic, export partial statistics Aim: infer statistics of original traffic from partial measurements

Short history

1993: Claffy et al. advocate sampling techniques at the packet level to reduce the load on measuring infrastructure. 2002-2003: Duffield et al. give estimates of first order quantities from packet level sampled traffic: average rate, mean number of packets per flows.

slide-5
SLIDE 5

Inverting Sampled Traffic

Motivation Sampling Techniques – Packet Sampling – Flow Sampling Comparison of sampling techniques – Distribution of the number of packets per flows – Spectral density of packet arrival process Application to traffic modelling

slide-6
SLIDE 6

Packet Sampling

Original traffic i.i.d. sampling with probability q Sampled traffic

Time

Simple example: recover original packet rate

  • Sample packets with probability q.
  • Measure rate of sampled traffic: λ(q).
  • Infer rate of original traffic: λ(q)/q
slide-7
SLIDE 7

Packet Sampling

Original traffic i.i.d. sampling with probability q Sampled traffic

Time Time

Simple example: recover original packet rate

  • Sample packets with probability q.
  • Measure rate of sampled traffic: λ(q).
  • Infer rate of original traffic: λ(q)/q
slide-8
SLIDE 8

Packet Sampling

Original traffic i.i.d. sampling with probability q Sampled traffic

Time Time Time

Simple example: recover original packet rate

  • Sample packets with probability q.
  • Measure rate of sampled traffic: λ(q).
  • Infer rate of original traffic: λ(q)/q
slide-9
SLIDE 9

Packet Sampling

Original traffic i.i.d. sampling with probability q Sampled traffic

Time Time Time

Simple example: recover original packet rate

  • Sample packets with probability q,
  • Measure rate of sampled traffic λ(q),
  • Infer rate of original traffic λ(q)/q.
slide-10
SLIDE 10

Terminology

IP flow: set of packets with same 5-tuple IP Source Destination Source Destination protocol Address Address Port Port

Time Time

Flow Level Packet Level

slide-11
SLIDE 11

Terminology

IP flow: set of packets with same 5-tuple IP Source Destination Source Destination protocol Address Address Port Port

Time

Flow Level Packet Level

slide-12
SLIDE 12

Original Traffic

Time Recovering original flow sizes not straightforward

slide-13
SLIDE 13

Flow Sampling

Time No ‘inversion’ problems

slide-14
SLIDE 14

Original Traffic

Time Recovering original flow sizes not straightforward

slide-15
SLIDE 15

Packet Sampling

Time Recovering original flow sizes not straightforward

slide-16
SLIDE 16

Inverting Sampled Traffic

Motivation Sampling Techniques – Packet Sampling – Flow Sampling Comparison of sampling techniques – Distribution of the number of packets per flows – Spectral density of packet arrival process Application to traffic modelling

slide-17
SLIDE 17

Distribution of number of packets per flow

Original traffic

Time

Packet sampling Flow Sampling

Time Time

Potential inversion problems No ‘inversion’ problems

slide-18
SLIDE 18

Distribution of number of packets per flow

Packet sampling

pj: Probability that a flow had j packets before sampling. p(q)

k : Probability that a flow has k packets after sampling,

p(q)

k

=

  • j=k

Pr{k packets after thinning| j packets before thinning}pj p(q)

k

=

  • j=k

j k

  • qk(1 − q)j−kpj

(1)

Aim: express pj as a function of p(q)

k

by inverting (1)

slide-19
SLIDE 19

Inverting (1) with generating functions

Definition:

GP(z) =

  • j=0

pjzj, z ∈ D(0, 1). D(z, r): open disc centered at z with radius r

Singularity at z = 1 if heavy tailed distribution. From (1):

G(q)

P (z)

=

  • k

p(q)

k zk = GP(1 − q + qz), z ∈ D(0, 1)

GP(z) = G(q)

P

z − (1 − q) q

  • , z ∈ D(1 − q, q)

Aim: Find power series expansion of GP at z = 0 Methods: – Analytic Continuation – Cauchy Integral

slide-20
SLIDE 20

Scheme 1: Analytic Continuation

q = 0.6

−1 −0.5 0.5 1 −1 −0.5 0.5 1

z0 z1

pj =

  • n=j

n j (−1)n−j qn (1 − q)n−jp(q)

n

(2)

slide-21
SLIDE 21

Scheme 1: Analytic Continuation

q = 0.1

−1 −0.5 0.5 1 −1 −0.5 0.5 1

z0 z1 z2 z3 z4 z5

pj = ...

slide-22
SLIDE 22

Scheme 2: Cauchy Integral

pj =

  • S

GP(z) zj+1 dz, (3) S: any closed contour containing the origin, for instance D(0, 1).

Inversion methods work well when GP can be directly evaluated on S Values of GP on D(0, 1) are unknown : obtained with Pad´ e Approximants

slide-23
SLIDE 23

Distribution of number of packets per flow

q = 0.6

10 10

1

10

2

10

3

10

−6

10

−4

10

−2

10

j (number of packets per flow) Pr(P=j)

Theoretical original density Flow thinning Packet thinning: scheme 1 Packet thinning: scheme 2

slide-24
SLIDE 24

Distribution of number of packets per flow

Packet sampling Flow Sampling

Time Time

Easy to implement, Need for consistent flow definition for sampled traffic (new timeout

T0),

Problems to estimate p(q) from sampled data, Severe numerical issues to recover the packet distribution (“impossible” for q < 0.5 !), Need on-line processing to create flows. No need to change flow definition, No inversion to recover packet distribution,

q plays no theoretical role.

Only the remaining number of flows matters for the estimation,

slide-25
SLIDE 25

Spectral density of packet arrival process

Original traffic

Time

Packet sampling Flow Sampling

Time Time

Potential inversion problems Potential inversion problems

slide-26
SLIDE 26

Spectral density of packet arrival process

ΓX(ω): spectral density of original traffic Γ(q)

X (ω): spectral density of sampled traffic

Packet sampling

Results from theory of thinned point processes give direct inversion

ΓX(ω) = 1 q2

  • Γ(q)

X (ω) − (1 − q)λ(q)

Flow sampling

Assumptions needed: Flow arrivals follow a Poisson process, Flows are uncorrelated.

ΓX(ω) = 1 qΓ(q)

X (ω)

slide-27
SLIDE 27

Study Second Order Structure

Analysis tools: Discrete Wavelet Transform

Definition:

Comparison of a signal X(t) with a family of functions ψj,k by means of inner products dX(j, k) =< X, ψj,k >, where ψj,k = 2−j/2ψ(2−jt − k), and ψ is the mother wavelet, localised both in time and frequency.

Properties:

{dX(j, k), k ∈ Z} is stationary and short range dependent for j fixed, variance(j) =E|dX(j, k)|2 For scaling processes: E|dX(j, k)|2 = 2jαE|dX(0, k)|2, For LRD processes: E|dX(j, k)|2 ∼ 2jαE|dX(0, k)|2 for large j. Wavelet Spectrum Estimate: log2

  • 1

nj

  • k |dX(j, k)|2

vs j Link with power spectral density: E|dX(j, k)|2 =

  • Γ

X(ν)2j|Ψ(2jν)|2dν

slide-28
SLIDE 28

Spectral density: q = 0.1

0.004 0.016 0.062 0.25 1 4 16 64 256 1024 −8 −6 −4 −2 2 4 6 8 10 12 6 8 10 12 14 16 18

j = log2 ( a ) log2 Var( d

j )

Original Packet Thinned Inferred from Packet Thinned Flow Thinned Inferred from Flow Thinned

slide-29
SLIDE 29

Spectral density: q = 0.001

30.5mus 977mus 0.031 1 32 −14 −12 −10 −8 −6 −4 −2 2 4 6 5 10 15 20 25 30 35

j = log2 ( a ) log2 Var( d

j )

Original Packet Thinned Inferred from Packet Thinned Flow Thinned Inferred from Flow Thinned

slide-30
SLIDE 30

Conclusions

Packet Sampling Flow Sampling Easy to implement, Need for consistent flow definition for sampled traffic (new timeout

T0),

Problems to estimate p(q) from sampled data, Severe numerical issues to recover the packet distribution (“impossible” for q < 0.5 !), Inaccurate estimation

  • f

the spectrum from sampled traffic for small q. Need on-line processing to create flows. No need to change flow definition, No inversion to recover packet distribution,

q plays no theoretical role.

Only the remaining number of flows matters for the estimation, Accurate spectrum estimation,

slide-31
SLIDE 31

Inverting Sampled Traffic

Motivation Sampling Techniques – Packet Sampling – Flow Sampling Comparison of sampling techniques – Distribution of the number of packets per flows – Spectral density of packet arrival process Application to traffic modelling

slide-32
SLIDE 32

Application to traffic modelling

Aim

Fit model to sampled traffic, Infer model parameters for unsampled traffic.

Theory

Closure properties of the Bartlett-Lewis Point Process under both packet and flow sampling.

Practice

Only flow thinning is applicable.

slide-33
SLIDE 33

Sampling the Bartlett-Lewis Point Process

0.004 0.016 0.062 0.25 1 4 16 64 256 1024 −8 −6 −4 −2 2 4 6 8 10 12 5 10 15 20

j = log2 ( a ) log2 Var( d

j )

Original BLPP matched to Original Flow Thinned BLPP matched to Flow Thinned BLPP reconstructed from Thinned

slide-34
SLIDE 34

Conclusions

Packet sampling Easy to implement but hard to infer original statistics beyond first order. Flow sampling Harder to implement but leads useful information about original traffic, for both flow and packet level statistics.