[PPT] - Load Shedding in Network Monitoring Applications Pere Barlet Ros PowerPoint Presentation

SLIDE 1

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Load Shedding in Network Monitoring Applications

Pere Barlet Ros

Advisor: Dr. Gianluca Iannaccone Co-advisor: Prof. Josep Solé Pareta

Universitat Politècnica de Catalunya (UPC) Departament d’Arquitectura de Computadors December 15, 2008

1 / 33

SLIDE 2

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Outline

1

Introduction

2

Prediction and Load Shedding Scheme

3

Fairness of Service and Nash Equilibrium

4

Custom Load Shedding

5

Conclusions and Future Work

2 / 33

SLIDE 3

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Outline

1

Introduction Motivation Related work Contributions

2

Prediction and Load Shedding Scheme

3

Fairness of Service and Nash Equilibrium

4

Custom Load Shedding

5

Conclusions and Future Work

3 / 33

SLIDE 4

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Motivation

Network monitoring is crucial for operating data networks

Traffic engineering, network troubleshooting, anomaly detection . . .

Monitoring systems are prone to dramatic overload situations

Link speeds, anomalous traffic, bursty traffic nature . . . Complexity of traffic analysis methods

Overload situations lead to uncontrolled packet loss

Severe and unpredictable impact on the accuracy of applications . . . when results are most valuable!!

4 / 33

SLIDE 5

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Motivation

Network monitoring is crucial for operating data networks

Traffic engineering, network troubleshooting, anomaly detection . . .

Monitoring systems are prone to dramatic overload situations

Link speeds, anomalous traffic, bursty traffic nature . . . Complexity of traffic analysis methods

Overload situations lead to uncontrolled packet loss

Severe and unpredictable impact on the accuracy of applications . . . when results are most valuable!!

Load Shedding Scheme Efficiently handle extreme overload situations Over-provisioning is not feasible

4 / 33

SLIDE 6

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Related Work

Load shedding in data stream management systems (DSMS)

Examples: Aurora, STREAM, TelegraphCQ, Borealis, . . . Based on declarative query languages (e.g., CQL) Small set of operators with known (and constant) cost Maximize an aggregate performance metric (utility or throughput)

Limitations

Restrict the type of metrics and possible uses Assume explicit knowledge of operators’ cost and selectivity Not suitable for non-cooperative environments

Resource management in network monitoring systems

Restricted to a pre-defined set of metrics Limit the amount of allocated resources in advance

5 / 33

SLIDE 7

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Contributions

1

Prediction method

Operates w/o knowledge of application cost and implementation Does not rely on a specific model for the incoming traffic

2

Load shedding scheme

Anticipates overload situations and avoids packet loss Relies on packet and flow sampling (equal sampling rates)

Extensions:

3

Packet-based scheduler

Applies different sampling rates to different queries Ensures fairness of service with non-cooperative applications

4

Support for custom-defined load shedding methods

Safely delegates load shedding to non-cooperative applications Still ensures robustness and fairness of service

6 / 33

SLIDE 8

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Outline

1

Introduction

2

Prediction and Load Shedding Scheme Case Study: Intel CoMo Prediction Methodology Load Shedding Scheme Evaluation and Operational Results

3

Fairness of Service and Nash Equilibrium

4

Custom Load Shedding

5

Conclusions and Future Work

7 / 33

SLIDE 9

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Case Study: Intel CoMo

CoMo (Continuous Monitoring)1

Open-source passive monitoring system Framework to develop and execute network monitoring applications Open (shared) network monitoring platform

Traffic queries are defined as plug-in modules written in C

Contain complex computations

1http://como.sourceforge.net

8 / 33

SLIDE 10

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Case Study: Intel CoMo

CoMo (Continuous Monitoring)1

Open-source passive monitoring system Framework to develop and execute network monitoring applications Open (shared) network monitoring platform

Traffic queries are defined as plug-in modules written in C

Contain complex computations

Traffic queries are black boxes Arbitrary computations and data structures Load shedding cannot use knowledge of the queries

1http://como.sourceforge.net

8 / 33

SLIDE 11

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Load Shedding Approach

Working Scenario Monitoring system supporting multiple arbitrary queries Single resource: CPU cycles Approach: Real-time modeling of the queries’ CPU usage

1

Find correlation between traffic features and CPU usage

Features are query agnostic with deterministic worst case cost

2

Exploit the correlation to predict CPU load

3

Use the prediction to decide the sampling rate

9 / 33

SLIDE 12

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

System Overview

Figure: Prediction and Load Shedding Subsystem

10 / 33

SLIDE 13

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Traffic Features vs CPU Usage

10 20 30 40 50 60 70 80 90 100 2 4 x 106 CPU cycles 10 20 30 40 50 60 70 80 90 100 1000 2000 3000 Packets 10 20 30 40 50 60 70 80 90 100 5 10 15 x 10

5

Bytes 10 20 30 40 50 60 70 80 90 100 1000 2000 3000 Time (s) 5−tuple flows

Figure: CPU usage compared to the number of packets, bytes and flows

11 / 33

SLIDE 14

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Traffic Features vs CPU Usage

1800 2000 2200 2400 2600 2800 3000 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 x 106 packets/batch CPU cycles new_5tuple_hashes < 500 500 <= new_5tuple_hashes < 700 700 <= new_5tuple_hashes < 1000 new_5tuple_hashes >= 1000

Figure: CPU usage versus the number of packets and flows

12 / 33

SLIDE 15

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Prediction Methodology2

Multiple Linear Regression (MLR) Yi = β0 + β1X1i + β2X2i + · · · + βpXpi + εi, i = 1, 2, . . . , n.

Yi = n observations of the response variable (measured cycles) Xji = n observations of the p predictors (traffic features) βj = p regression coefficients (unknown parameters to estimate) εi = n residuals (OLS minimizes SSE)

2P

. Barlet-Ros et al. "Load Shedding in Network Monitoring Applications", Proc. of USENIX Annual Technical Conference, 2007. 13 / 33

SLIDE 16

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Prediction Methodology2

Multiple Linear Regression (MLR) Yi = β0 + β1X1i + β2X2i + · · · + βpXpi + εi, i = 1, 2, . . . , n.

Yi = n observations of the response variable (measured cycles) Xji = n observations of the p predictors (traffic features) βj = p regression coefficients (unknown parameters to estimate) εi = n residuals (OLS minimizes SSE)

Feature Selection Variant of the Fast Correlation-Based Filter (FCBF) Removes irrelevant and redundant predictors Reduces significantly the cost and improves accuracy of the MLR

2P

. Barlet-Ros et al. "Load Shedding in Network Monitoring Applications", Proc. of USENIX Annual Technical Conference, 2007. 13 / 33

SLIDE 17

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Load Shedding Scheme3

Prediction and Load Shedding subsystem

1

Each 100ms of traffic is grouped into a batch of packets

2

The traffic features are efficiently extracted from the batch (multi-resolution bitmaps)

3

The most relevant features are selected (using FCBF) to be used by the MLR

4

MLR predicts the CPU cycles required by each query to run

5

Load shedding is performed to discard a portion of the batch

6

CPU usage is measured (using TSC) and fed back to the prediction system

3P

. Barlet-Ros et al. "On-line Predictive Load Shedding for Network Monitoring", Proc. of IFIP-TC6 Networking, 2007. 14 / 33

SLIDE 18

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Results: Load Shedding Performance

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm 1 2 3 4 5 6 7 8 9 x 109 time CPU usage [cycles/sec] CoMo cycles Load shedding cycles Query cycles Predicted cycles CPU limit

Figure: Stacked CPU usage (Predictive Load Shedding)

15 / 33

SLIDE 19

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Results: Load Shedding Performance

2 4 6 8 10 12 14 16 x 10

8

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 CPU usage [cycles/batch] F(CPU usage) CPU limit per batch Predictive Original Reactive

Figure: CDF of the CPU usage per batch

16 / 33

SLIDE 20

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Results: Packet Loss

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm 1 2 3 4 5 6 7 8 9 10 11 x 10

4

time packets Total DAG drops

(a) Original CoMo

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm 1 2 3 4 5 6 7 8 9 10 11 x 10

4

time packets Total DAG drops Unsampled

(b) Reactive Load Shedding

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm 1 2 3 4 5 6 7 8 9 10 11 x 104 time packets Total DAG drops Unsampled

(c) Predictive Load Shedding

Figure: Link load and packet drops

17 / 33

SLIDE 21

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Results: Error of the Queries

Query

riginal

reactive predictive application (pkts) 55.38% ±11.80 10.61% ±7.78 1.03% ±0.65 application (bytes) 55.39% ±11.80 11.90% ±8.22 1.17% ±0.76 counter (pkts) 55.03% ±11.45 9.71% ±8.41 0.54% ±0.50 counter (bytes) 55.06% ±11.45 10.24% ±8.39 0.66% ±0.60 flows 38.48% ±902.13 12.46% ±7.28 2.88% ±3.34 high-watermark 8.68% ±8.13 8.94% ±9.46 2.19% ±2.30 top-k destinations 21.63 ±31.94 41.86 ±44.64 1.41 ±3.32

Table: Errors in the query results (mean ± stdev)

Original Reactive Predictive 5 10 15 20 25 30 35 40 45 50 Error (%)

18 / 33

SLIDE 22

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Results: Load Shedding Overhead

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm 1 2 3 4 5 6 7 8 9 x 109 time CPU usage [cycles/sec] CoMo cycles Load shedding cycles Query cycles Predicted cycles CPU limit

Figure: Stacked CPU usage (Predictive Load Shedding)

19 / 33

SLIDE 23

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Outline

1

Introduction

2

Prediction and Load Shedding Scheme

3

Fairness of Service and Nash Equilibrium Load Shedding Strategy Evaluation

4

Custom Load Shedding

5

Conclusions and Future Work

20 / 33

SLIDE 24

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Fairness with Non-Cooperative Users4

Previous scheme does not differentiate among queries

An equal sampling rate is applied to all queries Better results can be obtained applying different sampling rates Considering the tolerance to sampling of each query

Challenges Queries are black boxes (i.e. external information is needed) Users are non-cooperative (selfish) in nature

4P

. Barlet-Ros et al. "Robust Network Monitoring in the presence of Non-Cooperative Traffic Queries", Computer Networks, 2008. 21 / 33

SLIDE 25

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Fairness with Non-Cooperative Users4

Previous scheme does not differentiate among queries

An equal sampling rate is applied to all queries Better results can be obtained applying different sampling rates Considering the tolerance to sampling of each query

Challenges Queries are black boxes (i.e. external information is needed) Users are non-cooperative (selfish) in nature Game theory-based strategy Variant of max-min fair share allocation policy Users provide a minimum sampling rate Single Nash Equilibrium when users provide correct information

4P

. Barlet-Ros et al. "Robust Network Monitoring in the presence of Non-Cooperative Traffic Queries", Computer Networks, 2008. 21 / 33

SLIDE 26

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Load Shedding Strategy (I)

Load shedding strategy

1

If (1), (2) are not satisfied, disable queries with greatest mq × dq

2

Compute cq’s that satisfy (1) and (2): mmfs_cpu: Maximize minimum allocation of cycles (cq) mmfs_pkt: Maximize minimum sampling rate (pq) Constraints ∀q∈Q (mq × dq) ≤ cq ≤ dq (1)

q∈Q

cq ≤ C (2) Notation

b dq = Cycles demanded by query q ∈ Q (prediction) mq = Minimum sampling rate of query q ∈ Q cq = Cycles allocated to query q ∈ Q C = System capacity in CPU cycles

22 / 33

SLIDE 27

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Load Shedding Strategy (II)

Advantages

1

Single Nash Equilibrium (NE) at

C |Q| (fair share of C)

2

Enforces users to provide small mq constraints

3

Non-cooperative users have no incentive to lie

4

Optimal solution can be computed in polynomial time

23 / 33

SLIDE 28

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Load Shedding Strategy (II)

Advantages

1

Single Nash Equilibrium (NE) at

C |Q| (fair share of C)

2

Enforces users to provide small mq constraints

3

Non-cooperative users have no incentive to lie

4

Optimal solution can be computed in polynomial time Limitations of previous approaches Maximize an aggregate performance metric (utility, throughput) Difficult to compute (assumptions on cost, selectivity, utility func.) NE when all players ask for the maximum possible allocation Not suitable for non-cooperative environments

23 / 33

SLIDE 29

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Results: Performance Evaluation

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

verload level (K)

accuracy (mean) no_lshed reactive eq_srates mmfs_cpu mmfs_pkt 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

verload level (K)

accuracy (min) no_lshed reactive eq_srates mmfs_cpu mmfs_pkt

Figure: Average (left) and minimum (right) accuracy with fixed mq constraints

24 / 33

SLIDE 30

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Outline

1

Introduction

2

Prediction and Load Shedding Scheme

3

Fairness of Service and Nash Equilibrium

4

Custom Load Shedding Proposed Method Enforcement Policy

5

Conclusions and Future Work

25 / 33

SLIDE 31

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Custom Load Shedding5

Not all monitoring applications are robust against sampling

Previous scheme penalizes these queries (mq = 1)

Queries can compute more accurate results with other methods

Examples: sample-and-hold, compute summaries, . . .

The solution of providing multiple mechanisms is unviable

The monitoring system supports arbitrary queries

5P

. Barlet-Ros et al. "Custom Load Shedding for Non-Cooperative Network Monitoring Applications", submitted to IEEE Infocom 2009. 26 / 33

SLIDE 32

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Custom Load Shedding5

Not all monitoring applications are robust against sampling

Previous scheme penalizes these queries (mq = 1)

Queries can compute more accurate results with other methods

Examples: sample-and-hold, compute summaries, . . .

The solution of providing multiple mechanisms is unviable

The monitoring system supports arbitrary queries

Custom Load Shedding Delegates the task of shedding load to end users Queries know their implementation (system == black boxes) Queries can compete under fair conditions for shared resources

5P

. Barlet-Ros et al. "Custom Load Shedding for Non-Cooperative Network Monitoring Applications", submitted to IEEE Infocom 2009. 26 / 33

SLIDE 33

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Custom Load Shedding5

Not all monitoring applications are robust against sampling

Previous scheme penalizes these queries (mq = 1)

Queries can compute more accurate results with other methods

Examples: sample-and-hold, compute summaries, . . .

The solution of providing multiple mechanisms is unviable

The monitoring system supports arbitrary queries

Custom Load Shedding Delegates the task of shedding load to end users Queries know their implementation (system == black boxes) Queries can compete under fair conditions for shared resources Requires an enforcement mechanism

5P

. Barlet-Ros et al. "Custom Load Shedding for Non-Cooperative Network Monitoring Applications", submitted to IEEE Infocom 2009. 26 / 33

SLIDE 34

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Example: P2P detector query

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 load shedding rate error flow sampling packet sampling custom load shedding

Figure: Accuracy error of a signature-based P2P detector query

27 / 33

SLIDE 35

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Enforcement Policy

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 10

8

10

9

load shedding rate (rq) CPU cycles (log) selfish prediction selfish actual custom prediction custom actual

Figure: Actual vs predicted CPU usage (p2p-detector)

MLR update history_value =

dq 1−rq

28 / 33

SLIDE 36

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Outline

1

Introduction

2

Prediction and Load Shedding Scheme

3

Fairness of Service and Nash Equilibrium

4

Custom Load Shedding

5

Conclusions and Future Work Conclusions Future Work

29 / 33

SLIDE 37

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Conclusions

Effective load shedding methods are now a basic requirement

Increasing data rates, number of users and application complexity Robustness against traffic anomalies and attacks

Predictive load shedding scheme

Operates without knowledge of the traffic queries Does not rely on a specific model for the input traffic Anticipates overload situations avoiding uncontrolled packet loss Graceful performance degradation (sampling & custom methods) Suitable for non-cooperative environments

Results in two operational networks show that:

The system is robust against severe overload The impact on the accuracy of the results is minimized

30 / 33

SLIDE 38

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Future Work

Study other system resources

Examples: memory, disk bandwidth, storage space, . . . Multi-dimensional load shedding schemes

Extend the prediction model

Study queries with non-linear relationships with traffic features Include payload-related and entropy-based features

Address resource management problem in a distributed platform

Load balancing and distribution techniques Other metrics: bandwidth between nodes, query delays, . . .

31 / 33

SLIDE 39

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Main Publications

P . Barlet-Ros, D. Amores-López, G. Iannaccone, J. Sanjuàs-Cuxart, and J. Solé-Pareta. On-line Predictive Load Shedding for Network Monitoring. In Proc.

f IFIP-TC6 Networking, Atlanta, USA, May 2007.

P . Barlet-Ros, G. Iannaccone, J. Sanjuàs-Cuxart, D. Amores-López, and J. Solé-Pareta. Load Shedding in Network Monitoring Applications. In Proc. of USENIX Annual Technical Conf., Santa Clara, USA, June 2007. P . Barlet-Ros, J. Sanjuàs-Cuxart, J. Solé-Pareta, and G. Iannaccone. Robust Resource Allocation for Online Network Monitoring. In Proc. of Intl. Telecommunications Network Workshop on QoS in Multiservice IP Networks (ITNEWS-QoSIP), Venice, Italy, Feb. 2008. P . Barlet-Ros, G. Iannaccone, J. Sanjuàs-Cuxart, and J. Solé-Pareta. Robust Network Monitoring in the presence of Non-Cooperative Traffic Queries. Computer Networks, Oct. 2008 (in press). Under submission: P . Barlet-Ros, G. Iannaccone, J. Sanjuàs-Cuxart, and J. Solé-Pareta. Custom Load Shedding for Non-Cooperative Monitoring Applications. Submitted to IEEE INFOCOM, Aug. 2008.

32 / 33

SLIDE 40

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Availability

The source code of load shedding system is publicly available at http://loadshedding.ccaba.upc.edu The CoMo monitoring system is available at http://como.sourceforge.net

Acknowledgments This work was funded by a University Research Grant awarded by the Intel Research Council and the Spanish Ministry of Education under contracts TSI2005-07520-C03-02 (CEPOS) and TEC2005-08051-C03-01 (CATARO) We thank CESCA and UPCnet for allowing us to evaluate the load shedding prototype in the Catalan NREN and UPC network respectively.

33 / 33

SLIDE 41

Introduction Load Shedding Fairness and NE Custom Load Shedding Conclusions

Load Shedding in Network Monitoring Applications

Pere Barlet Ros

Advisor: Dr. Gianluca Iannaccone Co-advisor: Prof. Josep Solé Pareta

Universitat Politècnica de Catalunya (UPC) Departament d’Arquitectura de Computadors December 15, 2008

33 / 33

SLIDE 42

Appendix Backup Slides *

Outline

6

Backup Slides Testbed Scenario Introduction CoMo System Prediction Method Prediction Validation and Evaluation Load Shedding Scheme Robustness Against Attacks Nash Equilibrium Custom Load Shedding Publications

3 / 33

SLIDE 43

Appendix Backup Slides *

Testbed Scenario

CESCA scenario UPC scenario

ptical

splitters

ptical

splitters

1 Gbps 1 Gbps PC-1: online CoMo (2 x DAG 4.3GE) PC-2: trace collection (2 x DAG 4.3GE)

clock synchronization

INTERNET RedIRIS UPC network Scientific Ring (~70 institutions)

Figure: CESCA and UPCnet scenarios

2 / 33

SLIDE 44

Appendix Backup Slides *

Datasets

Name Date/Time Trace Pkts Bytes Load (Mbps) (GB) (M) (GB) avg/max/min ABILENE 14/Aug/02 09:00-11:00 34.1 532.4 370.6 411.9/623.8/286.2 CENIC 17/Mar/05 15:50-16:20 3.8 59.5 56.0 249.7/936.9/079.1 CESCA-I 02/Nov/05 16:30-17:00 8.3 103.7 81.1 360.5/483.3/197.3 CESCA-II 11/Apr/06 08:00-08:30 30.9 49.4 29.9 133.0/212.2/096.2 UPC-I 07/Nov/07 18:00-18:30 54.7 95.2 53.0 253.5/399.0/177.8

Table: Traces in our dataset

Name Date/Time Trace Pkts Bytes Load (Mbps) (GB) (M) (GB) avg/max/min CESCA-III 24/Oct/06 09:00-17:00 155.5 2908.2 2764.8 750.4/973.6/129.0 CESCA-IV 25/Oct/06 09:00-17:00 152.5 2867.2 2652.2 719.9/967.5/218.0 CESCA-V 05/Dec/06 09:00-17:00 138.6 2037.8 1484.8 403.3/771.6/131.0 UPC-II 24/Apr/08 09:00-09:30 47.6 61.3 46.5 222.2/282.1/176.9

Table: Online executions

1 / 33

SLIDE 45

Appendix Backup Slides *

Monitoring Applications (Traffic Queries)

Query Description Method Cost application Port-based application classification packet low autofocus High volume traffic clusters per subnet packet med counter Traffic load in packets and bytes packet low flows Per-flow classification and number of active flows flow low high-watermark High watermark of link utilization over time packet low p2p-detector Signature-based P2P detector custom high pattern-search Identifies sequences of bytes in the payload packet high super-sources Detection of sources with largest fan-out flow med top-k Ranking of the top-k destination IP addresses packet low trace Full-payload packet collection custom med

application autofocus counter flows high-watermark p2p-detector pattern-search super-sources top-k trace 1 2 3 x108 43 44 CPU cost (cycles/s) 0 / 33

SLIDE 46

Appendix Backup Slides *

Challenges

1

Monitoring applications operate in real-time with live traffic

Load shedding scheme must be lightweight Quickly adapt to overload to prevent packet loss

2

Monitoring applications are non-cooperative

They will try to obtain the maximum resource share The system must ensure fairness of service and avoid starvation

3

Applications are arbitrary with unknown resource demands

Input data is continuous, highly variable and unpredictable Applications developed by 3rd parties using imperative languages No assumptions about the input traffic or application cost

1 / 33

SLIDE 47

Appendix Backup Slides *

Problem Space

Static (offline) Dynamic (online) Local

Static query assignment Resource provisioning Load shedding Query scheduling

Global

Placement of monitors Placement of queries Dissemination of queries Load distribution Table: Resource management problem space

2 / 33

SLIDE 48

Appendix Backup Slides *

General-Purpose Network Monitoring

Developing and deploying monitoring applications is complex

Continuous streams, high-speed sources, highly variable rates, . . . Different network devices, transport tech., system configs E.g., NICs, DAG cards, NetFlow, NPs, SNMP , . . .

Most application code is dedicated to deal with lateral aspects

Increased application complexity Large development times High probability of introducing undesired errors

Open network monitoring infrastructures

Abstract away the inner workings of the infrastructure No tailor made for a single specific application Examples: Scriptroute, FLAME, Intel CoMo, . . .

Research community demands open monitoring infrastructures

Share measurement data from multiple network viewpoints Fast prototype applications to evaluate novel analysis methods Study properties of network traffic and protocols

3 / 33

SLIDE 49

Appendix Backup Slides *

CoMo Architecture

Callback Description Process check() Stateful filters capture update() Packet processing export() Processes tuples sent by capture export action() Decides what to do with a record store() Stores records to disk load() Loads records from disk query print() Formats records

4 / 33

SLIDE 50

Appendix Backup Slides *

Work Hypothesis

Our thesis Cost of maintaining data structures needed to execute a query can be modeled looking at a set of traffic features Empirical observation Different overhead when performing basic operations on the state while processing incoming traffic

E.g., creating or updating entries, looking for a valid match, etc.

Cost is dominated by the overhead of some of these operations

5 / 33

SLIDE 51

Appendix Backup Slides *

Work Hypothesis

Our thesis Cost of maintaining data structures needed to execute a query can be modeled looking at a set of traffic features Empirical observation Different overhead when performing basic operations on the state while processing incoming traffic

E.g., creating or updating entries, looking for a valid match, etc.

Cost is dominated by the overhead of some of these operations Our method Models queries’ cost by considering the right set of traffic features

5 / 33

SLIDE 52

Appendix Backup Slides *

Traffic Features

No. Traffic aggregate 1 src-ip 2 dst-ip 3 protocol 4 <src-ip, dst-ip> 5 <src-port, proto> 6 <dst-port, proto> 7 <src-ip, src-port, proto> 8 <dst-ip, dst-port, proto> 9 <src-port, dst-port, proto> 10 <src-ip, dst-ip, src-port, dst-port, proto>

Counters per traffic aggregate

1

Number of unique items in a batch

2

Number of new items in a measurement interval

3

Number of repeated items in a batch

4

Number of repeated items in a measurement interval

6 / 33

SLIDE 53

Appendix Backup Slides *

Multiple Linear Regression

Multiple Linear Regression (MLR) Yi = β0 + β1X1i + β2X2i + · · · + βpXpi + εi, i = 1, 2, . . . , n.

Yi = n observations of the response variable (measured cycles) Xji = n observations of the p predictors (traffic features) βj = p regression coefficients (unknown parameters to estimate) εi = n residuals (OLS minimizes SSE) Cost = O(min(np2, n2p))

OLS assumptions X variables are independent (no multicollinearity) εi is normally distributed and expected value of ε vector is 0 No correlation between residuals and exhibit constant variance Covariance between predictors and residuals is 0

7 / 33

SLIDE 54

Appendix Backup Slides *

Fast Correlation Based Filter6

FCBF Algorithm

1

Selecting relevant predictors

Correlation of each predictor (X) and response variable (Y):

r=

Pn i=1(Xi −X)(Yi −Y)

√Pn

i=1(Xi −X)2√Pn i=1(Yi −Y)2

Predictors r < FCBF threshold are discarded

2

Removing redundant predictors

Sort predictors according to r Iteratively compute r ′ among each pair of predictors If r ′ > r, the predictor lower in the list is removed

Cost O(n p log p)

6L. Yu and H. Liu. Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution, Proc. of ICML, 2003.

8 / 33

SLIDE 55

Appendix Backup Slides *

Resource Usage Measurements

Time-stamp counter (TSC) 64-bit counter incremented by the processor at each clock cycle rdtsc: Reads TSC counter Sources of measurement noise

1

Instruction reordering: Serializing instruction (e.g., cpuid)

2

Context switches: Discard sample from MLR history (rusage structure, sched_setscheduler to SCHED_FIFO)

3

Disk accesses: Query memory accesses compete with CoMo disk accesses

9 / 33

SLIDE 56

Appendix Backup Slides *

Prediction parameters (n and FCBF threshold)

10 20 30 40 50 60 70 80 90 100 0.005 0.01 0.015 0.02 0.025 0.03 Relative error History (s) MLR error vs. cost (100 executions) 10 20 30 40 50 60 70 80 90 100 0.5 1 1.5 2 x 108 Cost (CPU cycles) average error average cost 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.005 0.01 0.015 0.02 0.025 0.03 Relative error FCBF threshold FCBF error vs. cost (100 executions) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 2 4 6 8 10 12 14 x 105 Cost (CPU cycles) average error average cost

Figure: Prediction error versus cost as a function of the amount of history used to compute the Multiple Linear Regression (left) and as a function of the Fast Correlation-Based Filter threshold (right)

10 / 33

SLIDE 57

Appendix Backup Slides *

Prediction parameters (broken down by query)

10 20 30 40 50 60 70 80 90 100 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 Relative error History (s) MLR error (100 executions) application counter pattern−search top−k trace flows high−watermark 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 Relative error FCBF threshold FCBF error (100 executions) application counter pattern−search top−k trace flows high−watermark

Figure: Prediction error broken down by query as a function of the amount of history used to compute the Multiple Linear Regression (left) and as a function of the Fast Correlation-Based Filter threshold (right)

11 / 33

SLIDE 58

Appendix Backup Slides *

Prediction Accuracy (CESCA-II trace)

200 400 600 800 1000 1200 1400 1600 1800 0.05 0.1 0.15 0.2 Time (s) Relative error Prediction error (5 executions − 7 queries) average error: 0.012364 max error: 0.13867 average max

Query Mean Stdev Selected features application 0.0110 0.0095 packets, bytes counter 0.0048 0.0066 packets flows 0.0319 0.0302 new dst-ip,dst-port,proto high-watermark 0.0064 0.0077 packets pattern-search 0.0198 0.0169 bytes top-k destinations 0.0169 0.0267 packets trace 0.0090 0.0137 bytes, packets

12 / 33

SLIDE 59

Appendix Backup Slides *

Prediction Accuracy (CESCA-I trace)

200 400 600 800 1000 1200 1400 1600 1800 0.05 0.1 0.15 0.2 Time (s) Relative error Prediction error (5 executions − 7 queries) average error: 0.0065407 max error: 0.19061 average max

Query Mean Stdev Selected features application 0.0068 0.0060 repeated 5-tuple, packets counter 0.0046 0.0053 packets flows 0.0252 0.0203 new dst-ip, dst-port, proto high-watermark 0.0059 0.0063 packets pattern-search 0.0098 0.0093 packets top-k destinations 0.0136 0.0183 new 5-tuple, packets trace 0.0092 0.0132 packets

13 / 33

SLIDE 60

Appendix Backup Slides *

Prediction Overhead

Prediction phase Overhead Feature extraction 9.070% FCBF 1.702% MLR 0.201% TOTAL 10.973%

Table: Prediction overhead (5 executions)

14 / 33

SLIDE 61

Appendix Backup Slides *

Load Shedding Scheme7

When to shed load When the prediction exceeds the available cycles avail_cycles = (0.1 × CPU frequency) − overhead

Overhead is measured using the time-stamp counter (TSC) Corrected according to prediction error and buffer space

How and where to shed load Packet and Flow sampling (hash based) The same sampling rate is applied to all queries How much load to shed Maximum sampling rate that keeps CPU usage < avail_cycles srate = avail_cycles

pred_cycles

7P

. Barlet-Ros et al. "On-line Predictive Load Shedding for Network Monitoring", Proc. of IFIP-TC6 Networking, 2007. 15 / 33

SLIDE 62

Appendix Backup Slides *

Load Shedding Scheme7

When to shed load When the prediction exceeds the available cycles avail_cycles = (0.1 × CPU frequency) − overhead

Overhead is measured using the time-stamp counter (TSC) Corrected according to prediction error and buffer space

How and where to shed load Packet and Flow sampling (hash based) The same sampling rate is applied to all queries How much load to shed Maximum sampling rate that keeps CPU usage < avail_cycles srate = avail_cycles

pred_cycles

7P

. Barlet-Ros et al. "On-line Predictive Load Shedding for Network Monitoring", Proc. of IFIP-TC6 Networking, 2007. 15 / 33

SLIDE 63

Appendix Backup Slides *

Load Shedding Scheme7

When to shed load When the prediction exceeds the available cycles avail_cycles = (0.1 × CPU frequency) − overhead

Overhead is measured using the time-stamp counter (TSC) Corrected according to prediction error and buffer space

How and where to shed load Packet and Flow sampling (hash based) The same sampling rate is applied to all queries How much load to shed Maximum sampling rate that keeps CPU usage < avail_cycles srate = avail_cycles

pred_cycles

7P

. Barlet-Ros et al. "On-line Predictive Load Shedding for Network Monitoring", Proc. of IFIP-TC6 Networking, 2007. 15 / 33

SLIDE 64

Appendix Backup Slides *

Load Shedding Algorithm

Load shedding algorithm (simplified version) pred_cycles = 0; foreach qi in Q do fi = feature_extraction(bi); si = feature_selection(fi, hi); pred_cycles += mlr(fi, si, hi); if avail_cycles < pred_cycles × (1 + error) then foreach qi in Q do bi = sampling(bi, qi, srate); fi = feature_extraction(bi); foreach qi in Q do query_cyclesi = run_query(bi, qi, srate); hi = update_mlr_history(hi, fi, query_cyclesi);

16 / 33

SLIDE 65

Appendix Backup Slides *

Reactive Load Shedding

Sampling rate at time t: sratet = min

1, max
α, sratet−1 × avail_cyclest − delay

consumed_cyclest−1

consumed_cyclest−1: cycles used in the previous batch

sratet−1: sampling rate applied to the previous batch delay: cycles by which avail_cyclest−1 was exceeded α: minimum sampling rate to be applied

17 / 33

SLIDE 66

Appendix Backup Slides *

Alternative Prediction Approaches (DDoS attacks)8

5 10 15 20 25 30 1 2 3 4 5 x 106 Time (s) CPU cycles actual predicted 5 10 15 20 25 30 0.1 0.2 0.3 0.4 0.5 Time (s) Relative error

(a) EWMA

5 10 15 20 25 30 1 2 3 4 5 x 106 Time (s) CPU cycles actual predicted 5 10 15 20 25 30 0.1 0.2 0.3 0.4 0.5 Time (s) Relative error

(b) SLR

5 10 15 20 25 30 1 2 3 4 5 x 106 Time (s) CPU cycles actual predicted 5 10 15 20 25 30 0.1 0.2 0.3 0.4 0.5 Time (s) Relative error

(c) MLR

Figure: Prediction accuracy under DDoS attacks (flows query)

8P

. Barlet-Ros et al. "Robust Resource Allocation for Online Network Monitoring", Proc. of IT-NEWS (QoSIP), 2008. 18 / 33

SLIDE 67

Appendix Backup Slides *

Impact of DDoS Attacks9

5 10 15 20 25 30 35 40 45 50 4 5 6 7 8 9 10 11 12 x 106 Time (s) CPU cycles no load shedding load shedding CPU threshold 5% bounds

(a) CPU usage

5 10 15 20 25 30 35 40 45 50 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Time (s) Relative error no load shedding packet sampling flow sampling

(b) Error in the query results

Figure: CPU usage and accuracy under DDoS attacks (flows query)

9P

. Barlet-Ros et al. "Robust Resource Allocation for Online Network Monitoring", Proc. of IT-NEWS (QoSIP), 2008. 19 / 33

SLIDE 68

Appendix Backup Slides *

Notation and Definitions

Symbol Definition Q Set of q continuous traffic queries C System capacity in CPU cycles c dq Cycles demanded by the query q ∈ Q (prediction) mq Minimum sampling rate constraint of the query q ∈ Q c Vector of allocated cycles cq Cycles allocated to the query q ∈ Q p Vector of sampling rates pq Sampling rate applied to the query q ∈ Q aq Action of the query q ∈ Q a−q Actions of all queries in Q except aq uq Payoff function of the query q ∈ Q

Max-min fair share allocation policy A vector of allocated cycles c is max-min fair if it is feasible, and for each q ∈ Q and feasible ¯ c for which cq < ¯ cq, there is some q′ where cq ≥ cq′ and cq′ > ¯ cq′.

20 / 33

SLIDE 69

Appendix Backup Slides *

Nash Equilibrium

Payoff function uq(aq, a−q) =        aq + mmfsq(C −

i:ui>0

ai), if

i:ai≤aq

ai ≤ C 0, if

i:ai≤aq

ai > C Definition: Nash Equilibrium (NE) A NE is an action profile a∗ with the property that no player i can do better by choosing an action profile different from a∗

i , given that every

player j adheres to a∗

j

Theorem Our resource allocation game has a single NE when all players demand

C |Q| cycles.

21 / 33

SLIDE 70

Appendix Backup Slides *

Proof

Proof: a∗ is a NE a∗ is a NE if ui(a∗) ≥ ui(ai, a∗

−i) for every player i and action ai, if

all other players keep their actions fixed to

C |Q|

ai >

C |Q|: ui(ai, a∗ −i) = 0

ai <

C |Q|: ui(ai, a∗ −i) = ai + mmfsi( C |Q| − ai) ≤ C |Q| (mmfsi(x) ≤ x)

Proof: a∗ is the only NE For any profile other than a∗

i there is at least one query that has

an incentive to change

i ai > C: queries with largest demands have incentive to obtain

a non-zero payoff

i ai < C: queries have incentive to capture spare cycles
i ai = C and ∃i : ai = C

Q : incentive to disable the query with

largest demand

22 / 33

SLIDE 71

Appendix Backup Slides *

Accuracy Function

1 1 sampling rate accuracy mq 1 − εq

Figure: Accuracy of a generic query

23 / 33

SLIDE 72

Appendix Backup Slides *

Accuracy with Minimum Constraints

Query mq Accuracy (mean ±stdev, K = 0.5) no_lshed reactive eq_srates mmfs_cpu mmfs_pkt application 0.03 0.57 ±0.50 0.81 ±0.40 0.99 ±0.04 1.00 ±0.00 1.00 ±0.03 autofocus 0.69 0.00 ±0.00 0.00 ±0.00 0.05 ±0.12 0.97 ±0.06 0.98 ±0.04 counter 0.03 0.00 ±0.00 0.02 ±0.12 1.00 ±0.00 1.00 ±0.00 0.99 ±0.01 flows 0.05 0.00 ±0.00 0.66 ±0.46 0.99 ±0.01 0.95 ±0.07 0.95 ±0.06 high-watermark 0.15 0.62 ±0.48 0.98 ±0.01 0.98 ±0.01 1.00 ±0.01 0.97 ±0.02 pattern-search 0.10 0.66 ±0.08 0.63 ±0.18 0.69 ±0.07 0.20 ±0.08 0.41 ±0.08 super-sources 0.93 0.00 ±0.00 0.00 ±0.00 0.00 ±0.00 0.95 ±0.04 0.95 ±0.04 top-k 0.57 0.42 ±0.50 0.67 ±0.47 0.96 ±0.09 0.99 ±0.03 0.96 ±0.07 trace 0.10 0.66 ±0.08 0.63 ±0.18 0.68 ±0.01 0.64 ±0.17 0.41 ±0.08

Table: Sampling rate constraints (mq) and average accuracy with K = 0.5

24 / 33

SLIDE 73

Appendix Backup Slides *

Example of Minimum Sampling Rates

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 sampling rate (pq) accuracy (1−ε q) high−watermark top−k p2p−detector

Figure: Accuracy as a function of the sampling rate (high-watermark, top-k and p2p-detector queries using packet sampling)

25 / 33

SLIDE 74

Appendix Backup Slides *

Results: Performance Evaluation

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

verload level (K)

accuracy (mean) no_lshed reactive eq_srates mmfs_cpu mmfs_pkt 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

verload level (K)

accuracy (min) no_lshed reactive eq_srates mmfs_cpu mmfs_pkt

Figure: Average (left) and minimum (right) accuracy with fixed mq constraints

Overload level (K) C′ = C × (1 − K), K = 1 −

C P

q∈Q b

dq

26 / 33

SLIDE 75

Appendix Backup Slides *

Enforcement Policy

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 109 10 10 load shedding rate CPU cycles (log) selfish prediction selfish actual custom prediction custom actual custom−corrected prediction custom−corrected actual

Figure: Actual vs predicted CPU usage

kq parameter kq = 1 −

dq,rq=0 dq,rq=1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 1 1.5 2 2.5 x 109 load shedding rate CPU cycles actual CPU usage expected CPU usage kq = 0.82

Figure: Actual vs expected CPU usage

Corrected load shedding rate rq = min

1,

r ′

q

kq

27 / 33

SLIDE 76

Appendix Backup Slides *

Performance Evaluation

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

verload level (K)

accuracy no_lshed eq_srates mmfs_pkt custom 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

verload level (K)

accuracy no_lshed eq_srates mmfs_pkt custom

Figure: Average (left) and minimum (right) accuracy at increasing K levels

28 / 33

SLIDE 77

Appendix Backup Slides *

Performance Evaluation

200 400 600 800 1000 1200 1400 1600 1800 10 x 109 cycles 200 400 600 800 1000 1200 1400 1600 1800 0.5 prediction error relative error 200 400 600 800 1000 1200 1400 1600 1800 0.5 delay compared to real time delay (s) 200 400 600 800 1000 1200 1400 1600 1800 0.5 1 system accuracy accuracy 200 400 600 800 1000 1200 1400 1600 1800 0.5 load shedding overhead

verhead

time (s) cycles available to CoMo prediction actual (after sampling)

Figure: Performance of a system that supports custom load shedding

29 / 33

SLIDE 78

Appendix Backup Slides *

Performance Evaluation

200 400 600 800 1000 1200 1400 1600 1800 1010 cycles (log) 200 400 600 800 1000 1200 1400 1600 1800 0.5 prediction error relative error 200 400 600 800 1000 1200 1400 1600 1800 0.5 delay compared to real time delay (s) 200 400 600 800 1000 1200 1400 1600 1800 0.5 1 system accuracy accuracy time (s) cycles available to CoMo prediction actual (after sampling)

Figure: Performance with a selfish query (every 3 minutes)

30 / 33

SLIDE 79

Appendix Backup Slides *

Online Performance

09:00 09:05 09:10 09:15 09:20 09:25 09:30 2 4 6 8 10 12 14 16 18 x 109 time [hh:mm] CPU usage [cycles/sec] CoMo + load_shedding overhead Query cycles Predicted cycles CPU limit 09:00 09:05 09:10 09:15 09:20 09:25 09:30 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 104 time [hh:mm] total packets buffer occupation (KB) DAG drops

Figure: Stacked CPU usage (left) and buffer occupation (right)

31 / 33

SLIDE 80

Appendix Backup Slides *

Publication Statistics

USENIX Annual Technical Conference

Estimated impact: 2.64 (rank: 7 of 1221)10 Acceptance ratio: 26.5% (31/117)

Computer Networks Journal

Impact factor: 0.829 (25/66)11

IFIP-TC6 Networking

Acceptance ratio: 21.8% (96/440)

ITNEWS-QoSIP

Acceptance ratio: 46.3% (44/95)

IEEE INFOCOM (under submission)

Estimated impact: 1.39 (rank: 133 of 1221)10 Acceptance ratio: 20.3% (236/1160)

10According to CiteSeer: http://citeseer.ist.psu.edu/impact.html 11Journal Citation Reports (JCR)

32 / 33

SLIDE 81

Appendix Backup Slides *

Load Shedding in Network Monitoring Applications

Pere Barlet Ros

Advisor: Dr. Gianluca Iannaccone Co-advisor: Prof. Josep Solé Pareta

Universitat Politècnica de Catalunya (UPC) Departament d’Arquitectura de Computadors December 15, 2008

33 / 33