An Analysis of Sampling Effects on Graph Structures Derived from - PowerPoint PPT Presentation

An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data Mark Meiss Advanced Network Management Laboratory Indiana University

Quick Overview  Why this study?  Existing work focuses on the effects of sampling on individual flows or distributions of flows.  Open question: How are graph structures built from flow data affected?

Quick Overview  Building graphs from flow data  Basic graph properties  Methodology  Experiments  Results  Take-home message: Aggregation matters and is not your enemy.

Background  “graph structures derived from network flow data”… ?

Basic network

Degree

Clustering Coefficient

Betweenness

Motifs

Weighted network

Strength

Directed network

Applications  Modeling and prediction  Anomaly detection  Application classification  Capacity planning  Community identification  (etc.)

Motivation  So what does packet sampling have to do with this?  Isn’t knowing p(sample) = 0.01 good enough?

Motivation

Motivation  The distributions of degree and strength for large-scale network data generally obey a power law:

Motivation  The exact value matters!

Methodology  Internet2 / Abilene used as testbed  Generate UDP traffic and analyze its traces in Abilene netflow-v5 data

Flow Generation Language (FGL)  FGL is a scripting language for quick and easy traffic generation: println("Bias study #4 (2008-12-10)"); println(); println("This FGL code will generate 100 128-byte packets to each UDP port"); println("in the range 10100-10199 on the hosts 64.57.17.200 - 64.57.17.209."); println(); x = proc(pkt) begin println("Emitting 100 of ", pkt); notate(pkt); emit(pkt, 100, 0.02); delay(0.10); end; port = range(10100, 10199); host = range(start:ip("64.57.17.200"), end:ip("64.57.17.209")); xip = [ ip_header(src:ip("156.56.103.1"), dst:@host) ]; xudp = [ udp_header(src_port:0, dst_port:@port) ]; xpacket = [ udp(@xip, @xudp, size:128, data:"This is a test.") ]; output("bias-study-4.event"); x(@xpacket);

Experiment #1 Note: p(sample) = 0.03. Generate flows of lengths between 1 and 200 packets; find chance of detection.

Experiment #2 Try to recover a power law, gamma = 2. Send to each of 10 hosts:  256 10-packet flows  128 20-packet flows  64 40-packet flows  (etc.)

Experiment #3 Second attempt to recover gamma = 2: Send to each of 10 hosts:  2048 10-packet flows  1024 20-packet flows  512 40-packet flows  (etc.)

Experiment #4 Third attempt to recover gamma = 2: Send to each of 10 hosts:  1024 100-packet flows  512 200-packet flows  256 400-packet flows  (etc.)

Result  A preponderance of very small flows will lead to an overestimate of the exponent.  All flows smaller than a critical threshold are statistically indistinguishable.

Result  With sufficiently large flow size, a range of exponents can be recovered reliably.

Is this a problem?  What if we don’t have sufficiently large flow size?

Aggregation  Aggregation is necessary for accurate results!  Flows repeat themselves.  Coalescing flows with identical endpoints allows us to distinguish smaller flows.

Aggregation  Failure to aggregate on the experiments described causes an over-estimate of about 0.2.  This can make a large difference for modeling!

Conclusions  Given appropriate aggregation, packet sampling does not affect the large-scale properties of graphs derived from flow data.  The effectiveness of aggregation in mitigating small-flow effects depends on repeated activity.

Future Work Effects on other properties (clustering, centrality, spectral). Effects on network growth models (preferential attachment, etc.). Effects on traffic models (PageRank, other Markov models).

Thank you! Any questions or observations?

An Analysis of Sampling Effects on Graph Structures Derived from - PowerPoint PPT Presentation

An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data Mark Meiss Advanced Network Management Laboratory Indiana University Quick Overview Why this study? Existing work focuses on the effects of sampling on

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Effects and State Liam OConnor CSE, UNSW (and Data61) Term 2 2019 1 Effects State IO

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

Welcome back. Today. Welcome back. Today. Continue Sampling combinatorial structures. Welcome

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Hypo contact and Sasakian SU ( 2 ) -structures in 5-dimensions structures on Lie groups Sasakian

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Computer Architecture Chapter 3 Fall 2005 Department of Computer Science Kent State University

Midlands Environmental Business Company Collaborating for Sustainable Profitable Business Growth

Magseis ASA Q2 2017 18th August 2017 Highlights Strong financial performance: Revenue of

bgp.he.net France-IX 2018 Walt Wollny, Director Interconnection Strategy Hurricane Electric

Path Loss Exponent Estimation in Large Wireless Networks Sunil Srinivasa and Martin Haenggi

Integrated Assessment: evolving sustainability pillars Stephen Timms and Martin Ward Better

ST-2:Using Mul/variate Models to Test Tradi/onal Reserving

A PPLICATION OF DMS500 3 of 3 Check the sensitivity level of the instrument Identify the

An Analysis of Sampling Effects on Graph Structures Derived from - PowerPoint PPT Presentation

An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data Mark Meiss Advanced Network Management Laboratory Indiana University Quick Overview Why this study? Existing work focuses on the effects of sampling on

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Effects and State Liam OConnor CSE, UNSW (and Data61) Term 2 2019 1 Effects State IO

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean &amp; Hypothesis Testing Sampling

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

Welcome back. Today. Welcome back. Today. Continue Sampling combinatorial structures. Welcome

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Hypo contact and Sasakian SU ( 2 ) -structures in 5-dimensions structures on Lie groups Sasakian

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Computer Architecture Chapter 3 Fall 2005 Department of Computer Science Kent State University

Midlands Environmental Business Company Collaborating for Sustainable Profitable Business Growth

Magseis ASA Q2 2017 18th August 2017 Highlights Strong financial performance: Revenue of

bgp.he.net France-IX 2018 Walt Wollny, Director Interconnection Strategy Hurricane Electric

Path Loss Exponent Estimation in Large Wireless Networks Sunil Srinivasa and Martin Haenggi

Integrated Assessment: evolving sustainability pillars Stephen Timms and Martin Ward Better

ST-2:Using Mul/variate Models to Test Tradi/onal Reserving

A PPLICATION OF DMS500 3 of 3 Check the sensitivity level of the instrument Identify the

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling