Network monitoring in high-speed links Algorithms and challenges - - PowerPoint PPT Presentation

network monitoring in high speed links
SMART_READER_LITE
LIVE PREVIEW

Network monitoring in high-speed links Algorithms and challenges - - PowerPoint PPT Presentation

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers Network monitoring in high-speed links Algorithms and challenges Pere Barlet-Ros Advanced Broadband Communications Center (CCABA) Universitat


slide-1
SLIDE 1

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Network monitoring in high-speed links

Algorithms and challenges Pere Barlet-Ros

Advanced Broadband Communications Center (CCABA) Universitat Politècnica de Catalunya (UPC) http://monitoring.ccaba.upc.edu pbarlet@ac.upc.edu

NGI course, 30 Nov. 2010

1 / 36

slide-2
SLIDE 2

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Outline

1

Network monitoring

2

Addressing the technological barriers

3

Use cases

4

Addressing the social barriers

2 / 36

slide-3
SLIDE 3

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Outline

1

Network monitoring Introduction Active monitoring Passive monitoring Technological and social barriers

2

Addressing the technological barriers

3

Use cases

4

Addressing the social barriers

3 / 36

slide-4
SLIDE 4

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Introduction to network monitoring

Process of measuring network systems and traffic

Routers, switches, servers, . . . Network traffic volume, type, topology, . . .

Monitoring is crucial for network operation and management

Traffic engineering, capacity planning, BW management Fault diagnosis, troubleshooting, performance evaluation Accounting, billing, security, . . .

Network measurements are important for networking research

Design and evaluation of protocols, applications, . . . Traffic modeling and characterization

Network monitoring is very challenging and datasets scarce

From technological and “social” standpoint

4 / 36

slide-5
SLIDE 5

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Classification

Classification of network monitoring tools and methods

Hardware vs. software Online vs. offline LAN vs. WAN Protocol level Active vs. passive

5 / 36

slide-6
SLIDE 6

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Active monitoring

Active tools are based on traffic injection

Probe traffic generated by a measurement device Response to probe traffic is measured

Pros: Flexibility

Devices can be deployed at the edge (e.g., end-hosts) No instrumentation at the core is needed Measurement does not directly rely on existing traffic

Cons: Intrusiveness

Probe traffic can degrade network performance Probe traffic can impact on the measurement itself

Main usages

Performance evaluation (e.g., ping) Bandwidth estimation (e.g., pathload) Topology discovery (e.g., traceroute)

6 / 36

slide-7
SLIDE 7

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Passive monitoring

Traffic collection from inside the network

Routers and switches (e.g., Cisco NetFlow) Passive devices (e.g., libpcap, DAG cards, optical taps)

Pros: Transparency

Network performance is not affected No additional traffic is injected Useful even with a single measurement point

Cons: Complexity

Requires administrative access to network devices Requires explicit presence of traffic under study Online collection and analysis is hard (e.g., sampling) Privacy concerns

Multiple and diverse usages

Traffic analysis and classification, . . . Anomaly and intrusion detection, . . .

7 / 36

slide-8
SLIDE 8

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Technological and social barriers

Datasets and platforms for research purposes are limited

Outdated datasets Academic traffic only Anonymized and without payloads

Technological barriers

Internet was designed without monitoring in mind Collection of Gb/s and storage of TB/day Links speeds increase at faster pace than processing speeds Building monitoring apps is error-prone and time-consuming

Social barriers

Lack of coordination between projects ISPs have no incentive to share information Privacy and competition concerns Monitoring hardware is expensive and difficult to manage

8 / 36

slide-9
SLIDE 9

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Outline

1

Network monitoring

2

Addressing the technological barriers Bloom filters Bitmap algorithms Direct bitmaps and variants Bitmaps over sliding windows

3

Use cases

4

Addressing the social barriers

9 / 36

slide-10
SLIDE 10

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Technological challenges

Few ns per packet

Interarrivals 8ns (40Gb/s), 32ns (10Gb/s) Memory access times < 10ns (SRAM), tens of ns (DRAM)

Obtaining simple metrics becomes extremely challenging

Approaches based on hash tables do not scale Core of most monitoring algorithms E.g., Active flows, flow size distribution, heavy hitter detection, delay, entropy, sophisticated sampling, . . .

Probabilistic approach: trade accuracy for speed

Extremely efficient compared to compute exact answer Fit in SRAM, 1 access/pkt Probabilistic guarantees (bounded error)

10 / 36

slide-11
SLIDE 11

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Bloom filters1

Space-efficient data structure to test set membership

Based on hashing (e.g., pseudo-random hash functions)

Examples of usage in network monitoring

Replace hash tables to check if a flow has already been seen Definition of flow is flexible

Advantages

Small memory (SRAM) is needed compared to hash tables

Limitations

False positives are possible Removals are not possible (counting variants can support them)

  • 1B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7), 1970.

11 / 36

slide-12
SLIDE 12

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Bloom filters

Parameters

k: #hash functions m: size of the bitmap p: false positive rate n: #elements in the filter (max) Figure: Example of a bloom filter2

  • 2A. Broder and M. Mitzenmacher. Network Applications of Bloom Filters: A Survey. Internet Mathematics, 1(4), 2005.

12 / 36

slide-13
SLIDE 13

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Direct bitmaps (linear counting)3

Space-efficient algorithms to count the number of unique items

E.g., useful to count the number of flows over a fixed time interval

Basic idea

Each flow hashes to one position (and all its packets) Counting the number of 1’s is inaccurate due to collisions Count the number of unset positions instead E.g., 20KB to count 1M flows with 1% error

Estimate formulae

Flow hashes to a given bit: p = 1/b No flow hashes to a given bit: pz = (1 − p)n ≈ (1/e)n/b Expected non-set bits: E[z] = bpz ≈ b(1/e)n/b Estimated number of flows: ˆ n = b ln(b/z)

3K.-Y. Whang et al. A linear-time probabilistic counting algorithm for database applications. ACM Trans. Database Syst., 15(2), 1990.

13 / 36

slide-14
SLIDE 14

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Bitmap variants4

Direct bitmaps scale linearly with the number of flows

Variants: Virtual, multiresolution, adaptive, triggered bitmaps, . . .

  • 4C. Estan, G. Varghese, M. Fisk. Bitmap algorithms for counting active flows on high speed links. IEEE/ACM Trans. Netw. 14(5), 2006.

14 / 36

slide-15
SLIDE 15

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Bitmaps over sliding windows

Timestamp Vector (TSV)5

Vector of timestamps (instead of bits) O(n) query cost

Countdown Vector (CDV)6

Vector of small timeout counters (instead of full timestamps) Independent query and update processes O(1) query cost

  • 5H. Kim, D. O’Hallaron. Counting network flows in real time. In Proc. of IEEE Globecom, Dec. 2003.
  • 6J. Sanjuàs-Cuxart et al. Counting flows over sliding windows in high speed networks. In Proc. of IFIP/TC6 Networking, May 2009.

15 / 36

slide-16
SLIDE 16

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

CDV and TSV performance

30-min trace, 271 Mbps, 1 query/s, 50K/10s-1.8M/min flows7

5 10 15 20 0.000 0.005 0.010 0.015 0.020 counter initialization value relative error 5 10 15 20 0.000 0.005 0.010 0.015 0.020 5 10 15 20 0.000 0.005 0.010 0.015 0.020 5 10 15 20 0.000 0.005 0.010 0.015 0.020 w = 10 s, average error w = 10 s, 95−percentile of error w = 300 s, average error w = 300 s, 95−percentile of error w = 600 s, average error w = 600 s, 95−percentile of error

10 30 60 120 300 600 0.5 1 1.5 2 2.5 x 10

6

window Bytes TSV TSV (extension) CDV 10 30 60 120 300 600 0.5 1 1.5 2 2.5 3 x 10

5

window accesses/s TSV TSV (extension) CDV

7A hash table would require several MBs with these settings

16 / 36

slide-17
SLIDE 17

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Outline

1

Network monitoring

2

Addressing the technological barriers

3

Use cases Load shedding Lossy difference aggregator

4

Addressing the social barriers

17 / 36

slide-18
SLIDE 18

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Overload problem

Previous solutions focus on a particular metric

They are not valid for any (arbitrary) monitoring application

Monitoring systems are prone to dramatic overload situations

Link speeds, anomalous traffic, bursty traffic nature . . . Complexity of traffic analysis methods

Overload situations lead to uncontrolled packet loss

Severe and unpredictable impact on the accuracy of applications . . . when results are most valuable!!

18 / 36

slide-19
SLIDE 19

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Overload problem

Previous solutions focus on a particular metric

They are not valid for any (arbitrary) monitoring application

Monitoring systems are prone to dramatic overload situations

Link speeds, anomalous traffic, bursty traffic nature . . . Complexity of traffic analysis methods

Overload situations lead to uncontrolled packet loss

Severe and unpredictable impact on the accuracy of applications . . . when results are most valuable!!

Load Shedding Scheme Efficiently handle extreme overload situations Over-provisioning is not feasible

18 / 36

slide-20
SLIDE 20

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Case study: Intel CoMo

CoMo (Continuous Monitoring)8

Open-source passive monitoring system Framework to develop and execute network monitoring applications Open (shared) network monitoring platform

Traffic queries are defined as plug-in modules written in C

Contain complex computations

8http://como.sourceforge.net

19 / 36

slide-21
SLIDE 21

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Case study: Intel CoMo

CoMo (Continuous Monitoring)8

Open-source passive monitoring system Framework to develop and execute network monitoring applications Open (shared) network monitoring platform

Traffic queries are defined as plug-in modules written in C

Contain complex computations

Traffic queries are black boxes Arbitrary computations and data structures Load shedding cannot use knowledge of the queries

8http://como.sourceforge.net

19 / 36

slide-22
SLIDE 22

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Load shedding approach9

Working scenario Monitoring system supporting multiple arbitrary queries Single resource: CPU cycles Approach: Real-time modeling of the queries’ CPU usage

1

Find correlation between traffic features and CPU usage

Features are query agnostic with deterministic worst case cost

2

Exploit the correlation to predict CPU load

3

Use the prediction to decide the sampling rate

9P

. Barlet-Ros, G. Iannaccone, J. Sanjuàs-Cuxart, and J. Solé-Pareta.. IEEE/ACM Transactions on Networking, 2010. 20 / 36

slide-23
SLIDE 23

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

System overview

Figure: Prediction and Load Shedding Subsystem

21 / 36

slide-24
SLIDE 24

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Traffic features vs CPU usage

10 20 30 40 50 60 70 80 90 100 2 4 x 106 CPU cycles 10 20 30 40 50 60 70 80 90 100 1000 2000 3000 Packets 10 20 30 40 50 60 70 80 90 100 5 10 15 x 10

5

Bytes 10 20 30 40 50 60 70 80 90 100 1000 2000 3000 Time (s) 5−tuple flows

Figure: CPU usage compared to the number of packets, bytes and flows

22 / 36

slide-25
SLIDE 25

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Traffic features vs CPU usage

1800 2000 2200 2400 2600 2800 3000 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 x 106 packets/batch CPU cycles new_5tuple_hashes < 500 500 <= new_5tuple_hashes < 700 700 <= new_5tuple_hashes < 1000 new_5tuple_hashes >= 1000

Figure: CPU usage versus the number of packets and flows

23 / 36

slide-26
SLIDE 26

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Prediction methodology10

Multiple Linear Regression (MLR) Yi = β0 + β1X1i + β2X2i + · · · + βpXpi + εi, i = 1, 2, . . . , n.

Yi = n observations of the response variable (measured cycles) Xji = n observations of the p predictors (traffic features) βj = p regression coefficients (unknown parameters to estimate) εi = n residuals (OLS minimizes SSE)

10P

. Barlet-Ros et al. Load Shedding in Network Monitoring Applications, In Proc. of USENIX Annual Technical Conf., 2007. 24 / 36

slide-27
SLIDE 27

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Prediction methodology10

Multiple Linear Regression (MLR) Yi = β0 + β1X1i + β2X2i + · · · + βpXpi + εi, i = 1, 2, . . . , n.

Yi = n observations of the response variable (measured cycles) Xji = n observations of the p predictors (traffic features) βj = p regression coefficients (unknown parameters to estimate) εi = n residuals (OLS minimizes SSE)

Feature selection Variant of the Fast Correlation-Based Filter (FCBF) Removes irrelevant and redundant predictors Reduces significantly the cost and improves accuracy of the MLR

10P

. Barlet-Ros et al. Load Shedding in Network Monitoring Applications, In Proc. of USENIX Annual Technical Conf., 2007. 24 / 36

slide-28
SLIDE 28

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Load shedding scheme11

Prediction and Load Shedding subsystem

1

Each 100ms of traffic is grouped into a batch of packets

2

The traffic features are efficiently extracted from the batch (multi-resolution bitmaps)

3

The most relevant features are selected (using FCBF) to be used by the MLR

4

MLR predicts the CPU cycles required by each query to run

5

Load shedding is performed to discard a portion of the batch

6

CPU usage is measured (using TSC) and fed back to the prediction system

11P

. Barlet-Ros et al. Robust network monitoring in the presence of non-cooperative traffic queries. Computer Networks, 53(3), 2009. 25 / 36

slide-29
SLIDE 29

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Results: Load shedding performance

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm 1 2 3 4 5 6 7 8 9 x 10

9

time CPU usage [cycles/sec] CoMo cycles Load shedding cycles Query cycles Predicted cycles CPU limit

Figure: Stacked CPU usage (Predictive Load Shedding)

26 / 36

slide-30
SLIDE 30

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Results: Load shedding performance

2 4 6 8 10 12 14 16 x 108 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 CPU usage [cycles/batch] F(CPU usage) CPU limit per batch Predictive Original Reactive

Figure: CDF of the CPU usage per batch

27 / 36

slide-31
SLIDE 31

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Results: Packet loss

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm 1 2 3 4 5 6 7 8 9 10 11 x 104 time packets Total DAG drops

(a) Original CoMo

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm 1 2 3 4 5 6 7 8 9 10 11 x 104 time packets Total DAG drops Unsampled

(b) Reactive Load Shedding

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm 1 2 3 4 5 6 7 8 9 10 11 x 104 time packets Total DAG drops Unsampled

(c) Predictive Load Shedding

Figure: Link load and packet drops

28 / 36

slide-32
SLIDE 32

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Results: Error of the queries

Query

  • riginal

reactive predictive application (pkts) 55.38% ±11.80 10.61% ±7.78 1.03% ±0.65 application (bytes) 55.39% ±11.80 11.90% ±8.22 1.17% ±0.76 counter (pkts) 55.03% ±11.45 9.71% ±8.41 0.54% ±0.50 counter (bytes) 55.06% ±11.45 10.24% ±8.39 0.66% ±0.60 flows 38.48% ±902.13 12.46% ±7.28 2.88% ±3.34 high-watermark 8.68% ±8.13 8.94% ±9.46 2.19% ±2.30 top-k destinations 21.63 ±31.94 41.86 ±44.64 1.41 ±3.32

Original Reactive Predictive 5 10 15 20 25 30 35 40 45 50 Error (%)

29 / 36

slide-33
SLIDE 33

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

One-way delay measurement

Traditional approaches are expensive

Probing traffic (intrusive) Trajectory sampling Constant overhead per sample

Alternative: LDA (Lossy Difference Aggregator)12

Send only sums of timestamps Deal with packet loss (sampling + partition input stream)

  • 12R. Kompella et al. Every microsecond counts: tracking fine-grain latencies with a lossy difference aggregator. SIGCOMM, 2010.

30 / 36

slide-34
SLIDE 34

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Lossy Difference Aggregator (LDA)

a b network

(delay = 1)

1,2,. . . ,5 2,3,. . . ,6

31 / 36

slide-35
SLIDE 35

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Lossy Difference Aggregator (LDA)

a b network

(delay = 1)

1,2,. . . ,5 2,3,. . . ,6

(2−1)+(3−2)+... n

= 1

(send n timestamps) 31 / 36

slide-36
SLIDE 36

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Lossy Difference Aggregator (LDA)

a b network

(delay = 1)

1,2,. . . ,5 2,3,. . . ,6

(2−1)+(3−2)+... n

= 1

(send n timestamps)

(2+3+...)−(1+2+...) 4

=

tb− ta n

= 1

(send just one ts) 31 / 36

slide-37
SLIDE 37

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Lossy Difference Aggregator (LDA)

a b network

(delay = 1)

1,2,. . . ,5 2,3,. . . ,6

  • #

15 5

  • #

20 5

31 / 36

slide-38
SLIDE 38

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Lossy Difference Aggregator (LDA)

a b network

(delay = 1)

1,2,. . . ,5 2,3,. . . ,6

  • #

15 5

  • #

20 5 result: 20−15

5

= 1

31 / 36

slide-39
SLIDE 39

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Lossy Difference Aggregator (LDA)

a b network

(delay = 1)

1,2,. . . ,5 2,3,. . . ,6

  • #

15 5

  • #

17 4 packet loss: mismatch in number of packets! must protect against packet loss

31 / 36

slide-40
SLIDE 40

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Lossy Difference Aggregator (LDA)

a b network

(delay = 1)

1,2,. . . ,5 2,3,. . . ,6 # # 1 1 2 1 2 1 3 1 7 2 9 2 5 1 6 1 total: < 15, 5 > total: < 20, 5 >

20−15 5

= 1

31 / 36

slide-41
SLIDE 41

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Lossy Difference Aggregator (LDA)

a b network

(delay = 1)

1,2,. . . ,5 2,3,. . . ,6 # # 1 1 2 1 2 1 7 2 9 2 5 1 6 1

  • total: < 13, 4 >

total: < 17, 4 >

17−13 4

= 1

31 / 36

slide-42
SLIDE 42

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Lossy Difference Aggregator (LDA)

a b network

(delay = 1)

1,2,. . . ,5 2,3,. . . ,6 # # 1 1 2 1 2 1 3 1 7 2 9 2 5 1 6 1

s s 31 / 36

slide-43
SLIDE 43

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Dimensioning the packet sampling rate13

PSR presents a classical tradeoff. The higher,

the more packets it aggregates, but. . . . . . the higher the chance of counter invalidation

We wish to maximize the exp. effective sample size: E [S] = (1 − r) p n en r p/b

which is max. with

p = b n r Original analysis obtained p ≈ 0.5 b

n r

  • ur analysis doubles the sampling rate

we improve sample size by ≈ 20%

Best algorithm in terms of network overhead up to ≈ 25% loss

  • 13J. Sanjuàs-Cuxart et al. Validation and improvement of the Lossy Difference Aggregator to measure packet delays. TMA, 2010.

32 / 36

slide-44
SLIDE 44

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

Outline

1

Network monitoring

2

Addressing the technological barriers

3

Use cases

4

Addressing the social barriers CoMo-UPC initiative

33 / 36

slide-45
SLIDE 45

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

The problem

Several researchers working on:

Anomaly Detection (AD) Traffic Classification (TC)

Real packet traces are needed to test novel AD and TC methods Several AD and TC algorithms require:

Unanonymized IP addresses (or prefix-preserving) Payload inspection (e.g., IDS, DPI, . . . )

Traditional solution: Anonymized traffic traces

Examples: NLANR, CAIDA, CRAWDAD, . . .

34 / 36

slide-46
SLIDE 46

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

The problem

Several researchers working on:

Anomaly Detection (AD) Traffic Classification (TC)

Real packet traces are needed to test novel AD and TC methods Several AD and TC algorithms require:

Unanonymized IP addresses (or prefix-preserving) Payload inspection (e.g., IDS, DPI, . . . )

Traditional solution: Anonymized traffic traces

Examples: NLANR, CAIDA, CRAWDAD, . . .

Anonymization is not the right solution! Data owners: Privacy concerns Researchers: Not enough data Lack of recent publicly available traces

34 / 36

slide-47
SLIDE 47

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

CoMo-UPC14

Move the code to the data

Instead of publishing anonymized data traces

Significantly lowers the privacy concerns

Traffic data do not leave provider premises Data providers keep the ownership of the data The source code can be inspected by the data owner

Researchers have (blind) access to unanonymized traffic

IP addresses and payloads can be processed . . . . . . but not stored or exported

The CoMo system is based on this model

Implement AD and TC methods as CoMo modules

14http://monitoring.ccaba.upc.edu/como-upc

35 / 36

slide-48
SLIDE 48

Network monitoring Addressing the technological barriers Use cases Addressing the social barriers

UPC traffic

UPC access link

1 GE full-duplex (≈ 900 Mbps, 60K flows/s) Connects 40 departments and 25 faculties (10 campuses) 12 traffic traces are also available

36 / 36