Space-Code Bloom Filter for Efficient Traffic Flow Measurement - - PowerPoint PPT Presentation

space code bloom filter for efficient traffic flow
SMART_READER_LITE
LIVE PREVIEW

Space-Code Bloom Filter for Efficient Traffic Flow Measurement - - PowerPoint PPT Presentation

Space-Code Bloom Filter for Efficient Traffic Flow Measurement Abhishek Kumar, Jun (Jim) Xu Networking and Telecommunications Group College of Computing Georgia Institute of Technology {akumar,jx}@cc.gatech.edu Li (Erran) Li Bell Laboratories


slide-1
SLIDE 1

Space-Code Bloom Filter for Efficient Traffic Flow Measurement Abhishek Kumar, Jun (Jim) Xu Networking and Telecommunications Group College of Computing Georgia Institute of Technology {akumar,jx}@cc.gatech.edu Li (Erran) Li Bell Laboratories erranlli@bell-labs.com Jia Wang AT&T Labs - Research jiawang@research.att.com

Internet Measurement Conference 2003

slide-2
SLIDE 2

Problem Statement Goal: To keep track of the total number of packets belonging to each flow at high speed links. Applications like traffic characterization, anomaly detection, per- flow QoS etc., need to know the size of all flows. Definition of Flow: All packets with the same flow-label. The flow-label can be defined as any combination of fields from the IP header, e.g <Source IP, source Port, Dest. IP, Dest. Port, Protocol>.

slide-3
SLIDE 3

Why is per-flow measurement hard?

  • Majority of the packets belong to large flows, yet a majority of

the flows are small.

  • High cost of maintaining per-flow data-structures. Amortiza-

tion is difficult.

  • No clear definition of the “end” of a flow.
slide-4
SLIDE 4

Related Approaches Sampling Sample packets with a fixed probability p and trace/process headers of sampled packets. This is the approach used by Cisco Netflow.

  • Flow-sizes can be inferred from sampled data.
  • Space-intensive.
  • Inaccurate for small flows.

Keep track of elephants

  • Fast algorithm to filter packets from large flows. [Estan and

Varghese, 2002]

  • Maintain counters for large flows only.
  • Success in tracking the largest few flows (e.g. carrying ≥ 1%
  • f the total traffic) with limited memory.
slide-5
SLIDE 5

Our Solution – Space-Code Bloom Filter (SCBF)

  • Tracks all flows – from mice to elephants.
  • Provides approximate estimate of flow-size.
  • The relative error in estimation is the same for all flow sizes.
  • The approx. estimates are close to the actual flow-size with

high probability.

slide-6
SLIDE 6

Operation of Space-Code Bloom Filter (SCBF) – Insertion Phase

  • Measurement proceeds in epochs (e.g. 10 second).
  • Maintain an aggregate synopsis data-structure.
  • Update the data-structure on every packet arrival.
  • Write-only data structure → fast updates, low hardware com-

plexity.

  • Copies of the synopsis are paged to disk periodically.

CPU SRAM Module 1 SRAM Module 2 Persistent Storage

  • 1. Process

header

  • 2. Write

to SCBF

  • 3. Paging

to disk

  • nce "full"
  • 4. Query

5.Answer SCBF Module

  • 0. New

packet arrival Header

slide-7
SLIDE 7

Operation of Space-Code Bloom Filter (SCBF) – Query Phase

  • Queries provide a flow-label and ask for its size.
  • Obtain a “count” from the data-structure and then lookup a

precomputed table to return approximate size of the flow.

  • This provides approximate estimates that have low relative er-

ror with high probability.

slide-8
SLIDE 8

Design of the aggregate data-structure

  • Bloom-filters answer set-membership questions with high ac-

curacy.

  • Space-Code Bloom Filter answers multiset-membership ques-

tions with high accuracy.

  • Use a number of “virtual” Bloom-filters, thus spreading the

multiplicity information over space.

  • Hash functions allow us to “isolate” flows from each other, thus

spreading the multiplicity information over code.

  • A Space-Code Bloom filter represents a large number of sta-

tistical estimators in parallel.

slide-9
SLIDE 9

Performance of SCBF - complexities

  • Computational complexity – compute 5 hash-fuctions and write

5 bits per packet.

  • Space complexity – 4 bits of storage required for each packet.
  • Can operate at OC768 (40 Gbps) with 5 ns SRAM.
  • More than 80% responses are within ±25% of the actual value.
slide-10
SLIDE 10

Accuracy of SCBF

1 10 100 1000 10000 100000 1 10 100 1000 10000 100000 Estimated flow length (packets) Original flow length (packets) estimated

  • riginal

(a) Original vs. estimated flow size. Note that both

axes are on logscale.

1 10 100 1000 10000 100000 0.001 0.01 0.1 1 10 100 Flow length (packets) Normalized rank of flows

  • riginal

estimated

(b) Distribution of the original and estimated flow

size.

slide-11
SLIDE 11

Conclusions

  • Space-Code Bloom Filters can track the approximate size of

every flow.

  • Per-flow accounting without per-flow state.
  • The relative error in approximation is same for all flow-sizes.
  • Very fast (upto OC768) implementations possible due to “write-
  • nly” nature of updates.
  • Design parameters of SCBF can be tuned to trade storage space

and CPU cycles for accuracy.

slide-12
SLIDE 12

Acknowledgments We thank Oliver Spatschek for providing us with the traffic traces.

slide-13
SLIDE 13

Questions ???

slide-14
SLIDE 14

Accuracy of SCBF using Maximum Likelihood Estimation (MLE)

0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1-δ F ε=1 ε=0.5 ε=0.25 ε=0.2

(c)Theoretical accuracy of MLE using 32 groups.

slide-15
SLIDE 15

Accuracy of SCBF using Maximum Likelihood Estimation

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

  • 1
  • 0.5

0.5 1 P[relative err < e] (CDF) e <-- all flows flows >=3 ---> <-- flows >=5 flows >=10 --> flows >=100 --> all flows flows >=2 flows >=3 flows >=5 flows >=10 flows >=100

(d) CDF of relative error for flows of various size