Load Shedding in Network Monitoring Applications Pere Barlet-Ros 1 - - PowerPoint PPT Presentation

load shedding in network monitoring applications
SMART_READER_LITE
LIVE PREVIEW

Load Shedding in Network Monitoring Applications Pere Barlet-Ros 1 - - PowerPoint PPT Presentation

Introduction Load Shedding Evaluation Load Shedding in Network Monitoring Applications Pere Barlet-Ros 1 Gianluca Iannaccone 2 Josep Sanjus 1 Diego Amores 1 Josep Sol-Pareta 1 1 Technical University of Catalonia (UPC) Barcelona, Spain 2


slide-1
SLIDE 1

Introduction Load Shedding Evaluation

Load Shedding in Network Monitoring Applications

Pere Barlet-Ros1 Gianluca Iannaccone2 Josep Sanjuàs1 Diego Amores1 Josep Solé-Pareta1

1Technical University of Catalonia (UPC)

Barcelona, Spain

2Intel Research

Berkeley, CA

Intel Research Berkeley, July 26, 2007

1 / 11

slide-2
SLIDE 2

Introduction Load Shedding Evaluation

Outline

1

Introduction Motivation Case Study: Intel CoMo

2

Load Shedding Prediction Method System Overview

3

Evaluation and Operational Results Performance Results Accuracy Results

2 / 11

slide-3
SLIDE 3

Introduction Load Shedding Evaluation

Motivation

Building robust network monitoring applications is hard

Unpredictable nature of network traffic Anomalous traffic, extreme data mixes, highly variable data rates

Processing requirements have greatly increased in recent years

E.g., intrusion and anomaly detection

3 / 11

slide-4
SLIDE 4

Introduction Load Shedding Evaluation

Motivation

Building robust network monitoring applications is hard

Unpredictable nature of network traffic Anomalous traffic, extreme data mixes, highly variable data rates

Processing requirements have greatly increased in recent years

E.g., intrusion and anomaly detection

The problem Efficiently handling extreme overload situations Over-provisioning is not possible

3 / 11

slide-5
SLIDE 5

Introduction Load Shedding Evaluation

Case Study: Intel CoMo

CoMo (Continuous Monitoring)1

Open-source passive monitoring system Fast implementation and deployment of monitoring applications

Traffic queries are defined as plug-in modules written in C

Contain complex stateful computations

1http://como.sourceforge.net

4 / 11

slide-6
SLIDE 6

Introduction Load Shedding Evaluation

Case Study: Intel CoMo

CoMo (Continuous Monitoring)1

Open-source passive monitoring system Fast implementation and deployment of monitoring applications

Traffic queries are defined as plug-in modules written in C

Contain complex stateful computations

Traffic queries are black boxes Arbitrary computations and data structures Load shedding cannot use knowledge about the queries

1http://como.sourceforge.net

4 / 11

slide-7
SLIDE 7

Introduction Load Shedding Evaluation

Load Shedding Approach

Main idea

1

Find correlation between traffic features and CPU usage

Features are query agnostic with deterministic worst case cost

2

Leverage correlation to predict CPU load

3

Use prediction to guide the load shedding procedure

5 / 11

slide-8
SLIDE 8

Introduction Load Shedding Evaluation

Load Shedding Approach

Main idea

1

Find correlation between traffic features and CPU usage

Features are query agnostic with deterministic worst case cost

2

Leverage correlation to predict CPU load

3

Use prediction to guide the load shedding procedure Novelty: No a priori knowledge of the queries is needed Preserves high degree of flexibility Increases possible applications and network scenarios

5 / 11

slide-9
SLIDE 9

Introduction Load Shedding Evaluation

Traffic Features vs CPU Usage

10 20 30 40 50 60 70 80 90 100 2 4 x 10

6

CPU cycles 10 20 30 40 50 60 70 80 90 100 1000 2000 3000 Packets 10 20 30 40 50 60 70 80 90 100 5 10 15 x 10

5

Bytes 10 20 30 40 50 60 70 80 90 100 1000 2000 3000 Time (s) 5−tuple flows

Figure: CPU usage compared to the number of packets, bytes and flows

6 / 11

slide-10
SLIDE 10

Introduction Load Shedding Evaluation

System Overview

Figure: Prediction and Load Shedding Subsystem

7 / 11

slide-11
SLIDE 11

Introduction Load Shedding Evaluation

Load Shedding Performance

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm 1 2 3 4 5 6 7 8 9 x 10

9

time CPU usage [cycles/sec] CoMo cycles Load shedding cycles Query cycles Predicted cycles CPU frequency

Figure: Stacked CPU usage (Predictive Load Shedding)

8 / 11

slide-12
SLIDE 12

Introduction Load Shedding Evaluation

Load Shedding Performance

2 4 6 8 10 12 14 16 x 10

8

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 CPU usage [cycles/batch] F(CPU usage) CPU cycles per batch Predictive Original Reactive

Figure: CDF of the CPU usage per batch

9 / 11

slide-13
SLIDE 13

Introduction Load Shedding Evaluation

Accuracy Results

Queries estimate their unsampled output by multiplying their results by the inverse of the sampling rate Errors in the query results (mean ± stdev)

Query

  • riginal

reactive predictive application (pkts) 55.38% ±11.80 10.61% ±7.78 1.03% ±0.65 application (bytes) 55.39% ±11.80 11.90% ±8.22 1.17% ±0.76 flows 38.48% ±902.13 12.46% ±7.28 2.88% ±3.34 high-watermark 8.68% ±8.13 8.94% ±9.46 2.19% ±2.30 link-count (pkts) 55.03% ±11.45 9.71% ±8.41 0.54% ±0.50 link-count (bytes) 55.06% ±11.45 10.24% ±8.39 0.66% ±0.60 top destinations 21.63 ±31.94 41.86 ±44.64 1.41 ±3.32

10 / 11

slide-14
SLIDE 14

Introduction Load Shedding Evaluation

Ongoing and Future Work

Ongoing Work

Query utility functions Custom load shedding Fairness of service with non-cooperative users Scheduling CPU access vs. packet stream

Future Work

Distributed load shedding Other system resources (memory, disk bandwidth, storage space)

11 / 11

slide-15
SLIDE 15

Introduction Load Shedding Evaluation

Availability

The source code of our load shedding prototype is publicly available at http://loadshedding.ccaba.upc.edu The CoMo monitoring system is available at http://como.sourceforge.net

Acknowledgments This work was funded by a University Research Grant awarded by the Intel Research Council and the Spanish Ministry of Education under contract TEC2005-08051-C03-01 Authors would also like to thank the Supercomputing Center of Catalonia (CESCA) for giving them access the Catalan RREN

11 / 11

slide-16
SLIDE 16

Appendix Backup Slides

Work Hypothesis

Our thesis Cost of mantaining data structures needed to execute a query can be modeled looking at a set of traffic features Empirical observation Different overhead when performing basic operations on the state while processing incoming traffic

E.g., creating or updating entries, looking for a valid match, etc.

Cost of a query is mostly dominated by the overhead of some of these operations

2 / 11

slide-17
SLIDE 17

Appendix Backup Slides

Work Hypothesis

Our thesis Cost of mantaining data structures needed to execute a query can be modeled looking at a set of traffic features Empirical observation Different overhead when performing basic operations on the state while processing incoming traffic

E.g., creating or updating entries, looking for a valid match, etc.

Cost of a query is mostly dominated by the overhead of some of these operations Our method Models queries’ cost by considering the right set of traffic features

2 / 11

slide-18
SLIDE 18

Appendix Backup Slides

Traffic Features vs CPU Usage

1800 2000 2200 2400 2600 2800 3000 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 x 10

6

packets/batch CPU cycles new_5tuple_flows < 500 500 ≤ new_5tuple_flows < 700 700 ≤ new_5tuple_flows < 1000 new_5tuple_flows ≥ 1000

Figure: CPU usage versus the number of packets and flows

3 / 11

slide-19
SLIDE 19

Appendix Backup Slides

Multiple Linear Regression (MLR)

Linear Regression Model Yi = β0 + β1X1i + β2X2i + · · · + βpXpi + εi, i = 1, 2, . . . , n.

Yi = n observations of the response variable (measured cycles) Xji = n observations of the p predictors (traffic features) βj = p regression coefficients (unknown parameters to estimate) εi = n residuals (OLS minimizes SSE)

Feature Selection Variant of the Fast Correlation-Based Filter2 (FCBF) Removes irrelevant and redundant predictors Reduces significantly the cost of the MLR

  • 2L. Yu and H. Liu. Feature Selection for High-Dimensional Data:

A Fast Correlation-Based Filter Solution. In Proc. of ICML, 2003. 4 / 11

slide-20
SLIDE 20

Appendix Backup Slides

System Overview

Prediction and Load Shedding subsystem

1

Each 100ms of traffic is grouped into a batch of packets

2

The traffic features are efficiently extracted from the batch (multi-resolution bitmaps)

3

The most relevant features are selected (using FCBF) to be used by the MLR

4

MLR predicts the CPU cycles required by the query to run

5

Load shedding is performed to discard a portion of the batch

6

CPU usage is measured (using TSC) and fed back to the prediction system

5 / 11

slide-21
SLIDE 21

Appendix Backup Slides

Load Shedding

When to shed load When the prediction exceeds the available cycles avail_cycles = (0.1 × CPU frequency) − overhead

Corrected according to prediction error and buffer space Overhead is measured using the time-stamp counter (TSC)

How and where to shed load Packet and Flow sampling (hash based) The same sampling rate is applied to all queries How much load to shed Maximum sampling rate that keeps CPU usage < avail_cycles srate = avail_cycles

pred_cycles

6 / 11

slide-22
SLIDE 22

Appendix Backup Slides

Load Shedding

When to shed load When the prediction exceeds the available cycles avail_cycles = (0.1 × CPU frequency) − overhead

Corrected according to prediction error and buffer space Overhead is measured using the time-stamp counter (TSC)

How and where to shed load Packet and Flow sampling (hash based) The same sampling rate is applied to all queries How much load to shed Maximum sampling rate that keeps CPU usage < avail_cycles srate = avail_cycles

pred_cycles

6 / 11

slide-23
SLIDE 23

Appendix Backup Slides

Load Shedding

When to shed load When the prediction exceeds the available cycles avail_cycles = (0.1 × CPU frequency) − overhead

Corrected according to prediction error and buffer space Overhead is measured using the time-stamp counter (TSC)

How and where to shed load Packet and Flow sampling (hash based) The same sampling rate is applied to all queries How much load to shed Maximum sampling rate that keeps CPU usage < avail_cycles srate = avail_cycles

pred_cycles

6 / 11

slide-24
SLIDE 24

Appendix Backup Slides

Load Shedding Algorithm

Load shedding algorithm (simplified version) pred_cycles = 0; foreach qi in Q do fi = feature_extraction(bi); si = feature_selection(fi, hi); pred_cycles += mlr(fi, si, hi); if avail_cycles < pred_cycles × (1 + error) then foreach qi in Q do bi = sampling(bi, qi, srate); fi = feature_extraction(bi); foreach qi in Q do query_cyclesi = run_query(bi, qi, srate); hi = update_mlr_history(hi, fi, query_cyclesi);

7 / 11

slide-25
SLIDE 25

Appendix Backup Slides

Testbed Scenario

Equipment and network scenario

2 × Intel R PentiumTM 4 running at 3 GHz 2 × Endace R DAG 4.3GE cards 1 × Gbps link connecting Catalan RREN to Spanish NREN

Executions

Execution Date Time Link load (Mbps) mean/max/min predictive 24/Oct/06 9am:5pm 750.4/973.6/129.0

  • riginal

25/Oct/06 9am:5pm 719.9/967.5/218.0 reactive 05/Dec/06 9am:5pm 403.3/771.6/131.0

Queries (from the standard distribution of CoMo)

Name Description application Port-based application classification counter Traffic load in packets and bytes flows Per-flow counters high-watermark High watermark of link utilization pattern search Finds sequences of bytes in the payload top destinations List of the top-10 destination IPs trace Full-payload collection

8 / 11

slide-26
SLIDE 26

Appendix Backup Slides

Packet Loss

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm 1 2 3 4 5 6 7 8 9 10 11 x 10

4

time packets Total DAG drops

(a) Original CoMo

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm 1 2 3 4 5 6 7 8 9 10 11 x 10

4

time packets Total DAG drops Unsampled

(b) Reactive Load Shedding

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm 1 2 3 4 5 6 7 8 9 10 11 x 10

4

time packets Total DAG drops Unsampled

(c) Predictive Load Shedding

Figure: Link load and packet drops

9 / 11

slide-27
SLIDE 27

Appendix Backup Slides

Related Work

Network Monitoring Systems Only consider a pre-defined set of metrics Filtering, aggregation, sampling, etc. Data Stream Management Systems Define a declarative query language (small set of operators) Operators’ resource usage is assumed to be known Selectively discard tuples, compute summaries, etc.

10 / 11

slide-28
SLIDE 28

Appendix Backup Slides

Related Work

Network Monitoring Systems Only consider a pre-defined set of metrics Filtering, aggregation, sampling, etc. Data Stream Management Systems Define a declarative query language (small set of operators) Operators’ resource usage is assumed to be known Selectively discard tuples, compute summaries, etc. Limitations Restrict the type of metrics and possible uses Assume explicit knowledge of operators’ cost and selectivity

10 / 11

slide-29
SLIDE 29

Appendix Backup Slides

Conclusions and Future Work

Effective load shedding methods are now a basic requirement

Rapidly increasing data rates, number of users and complexity of analysis methods

Load shedding operates without knowledge of the traffic queries

Quickly adapts to overload situations by gracefully degrading accuracy via packet and flow sampling

Operational results in a research ISP network show that:

The system is robust to severe overload The impact on the accuracy of the results is minimized

Limitations and Future work

Load shedding methods for queries non robust against sampling Load shedding strategies to maximize the overall system utility Other system resources (memory, disk bandwidth, storage space)

11 / 11