Multi-dimensional Packet Classification Yadi Ma, Suman Banerjee - - PowerPoint PPT Presentation
Multi-dimensional Packet Classification Yadi Ma, Suman Banerjee - - PowerPoint PPT Presentation
A Smart Pre-Classifier to Reduce Power Consumption of TCAMs for Multi-dimensional Packet Classification Yadi Ma, Suman Banerjee University of Wisconsin-Madison Packet classification S1 L1 D S2 R Internet L2 Subnet A Subnet B
Packet classification
R Internet S1 S2 Subnet A Subnet B D From To Traffic type Action S1 D Port 80 Forward via L1 S2 D * Drop all traffic A B * Reserve 50 Mbps L1 L2 Classifier at Router R
Definition
- Packet classification: given a classifier, find the first (highest priority)
matching rule for each incoming packet
- A classifier contains a set of rules ordered by priority
- Our focus: n-tuple classification
- Example classifier:
- Given a packet header: (32.75.226.153, 198.35.180.5, 80,1040, UDP)
Rule # Source IP
- Dest. IP
Source Port
- Dest. Port
Protocol Action 1 * 10.112.*.* 5001 - 65535 * TCP deny 2 32.75.226.153 * * 1001 - 2000 UDP deny 3 199.36.184.* * 49152 - 65535 * UDP deny 4 * * * * * permit
Packet classification schemes
- Software-based schemes
– Tradeoff between memory usage and speed – Examples: HiCuts, HyperCuts, EffiCuts, etc
- Hardware (TCAM)-based schemes
– Popular for high-throughput packet classification
Problem Statement
- TCAMs are power-hungry
- Design a TCAM-based method that:
– Greatly reduces power consumption of TCAMs, especially for large classifiers – Uses commodity TCAMs – Is easy to implement
Outline
Introduction and motivation Design of SmartPC
– Algorithms to manage two-stage classification
Evaluation methods and results Conclusion
Packet classification system for SmartPC
- Two-stage classification
– First stage: pre-classifier – Second stage: two parallel searches
Index TCAM (Pre-classifier entries) Match index Index SRAM TCAM (Classifier rules) Associated SRAM (priorities + actions) “General” blocks Priority resolution Action “Specific” block
How to build an efficient pre-classifier?
Pre-classifier
- How to build a pre-classifier?
– Built on two dimensions: source IP address and destination IP addresses – By expanding and combining two dimensional rules recursively
- Also shuffle original rules into different
TCAM blocks accordingly
Why 5d to 2d is a good choice?
Maximum number of overlapping rules in the two-dimensional space
- Analyze more than 200 real classifiers ranging in
size from 3 to 15,181
Maximum number of overlapping rules is an order of magnitude smaller than classifier size.
An example classifier containing 14 rules
Same example classifier containing 14 rules
27 27 27
SmartPC
2 1 2 3/4 5 6 7 8 9 10 11/12/13
Dst_addr Src_addr
P0 P1
0,1,5,6,8 P0,P1
TCAM
Pre-classifier
28 28 28
SmartPC
1 2 3/4 5 6 7 8 9 10 11/12/13
Dst_addr Src_addr
P0 P1
0,1,5,6,8 2, 3,4,9,10 P0,P1
Specific blocks
TCAM
Pre-classifier
29 29 29
SmartPC
1 2 3/4 5 6 7 8 9 10 11/12/13
Dst_addr Src_addr
P0 P1
0,1,5,6,8 2, 3,4,9,10 P0,P1
TCAM
Pre-classifier General block
7,11,12,13
Specific blocks
35 35 35
Example: how to build a pre-classifier
1 2 3/4 5 6 7 8 9 10 11/12/13
Dst_addr Src_addr
P0
P0
2
36 36 36
Example: how to build a pre-classifier
1 2 3/4 5 6 7 8 9 10 11/12/13
Dst_addr Src_addr
P0
P0
2
, 1
37 37 37
Example: how to build a pre-classifier
1 2 3/4 5 6 7 8 9 10 11/12/13
Dst_addr Src_addr
P0
P0
2
, 1
38 38 38
Example: how to build a pre-classifier
1 2 3/4 5 6 7 8 9 10 11/12/13
Dst_addr Src_addr
P0
P0
2
, 1, 5, 6
39 39 39
Example: how to build a pre-classifier
1 2 3/4 5 6 7 8 9 10 11/12/13
Dst_addr Src_addr
P0
P0
2
, 1, 5, 6 7
40 40 40
Example: how to build a pre-classifier
1 2 3/4 5 6 7 8 9 10 11/12/13
Dst_addr Src_addr
P0
P0
2
, 1, 5, 6 7 , 8
41 41 41
Example: how to build a pre-classifier
1 2 3/4 5 6 7 8 9 10 11/12/13
Dst_addr Src_addr
P0
P0
2
, 1, 5, 6 7 ,11,12,13 , 8
42 42 42
Example: how to build a pre-classifier
1 2 3/4 5 6 7 8 9 10 11/12/13
Dst_addr Src_addr
P0
P0
2
, 1, 5, 6 7 ,11,12,13 , 8
P1
, P1
43 43 43
Example: how to build a pre-classifier
1 2 3/4 5 6 7 8 9 10 11/12/13
Dst_addr Src_addr
P0
P0 0 , 1, 5, 6 7 ,11,12,13 , 8
P1
2, 3,4,9,10 , P1
Specific blocks General block Pre-classifier
packet
44 44 44 Index TCAM (Pre-classifier entries) Match index Incoming packet Index SRAM 0, 1, 5, 6, 8 7, 11, 12, 13 TCAM (Classifier rules) Associated SRAM (priorities + actions) General block(s) 1, accept Priority resolution accept 7, deny 1 1 P0 P1 2 ,3, 4, 9, 10 Specific block
. . .
. . .
Packet classification system for SmartPC
0, 1, 5, 6, 8 7, 11, 12, 13 1, accept 7, deny
Properties of pre-classifiers
- Entries in a pre-classifier are non-overlapping
- Each rule in a classifier is either covered by only
- ne pre-classifier entry, or marked as general
Rule update
- Rule update overhead of SmartPC is generally smaller
than that of regular TCAMs
- The ordering of TCAM entries is kept within one specific
block or within a small number of general blocks, rather than throughout all the blocks
- Rule update
– Insert a rule – Delete a rule
Outline
Introduction and motivation Design of SmartPC
– Algorithms to manage two-stage classification
Evaluation methods and results Conclusion
Experimental setup (1)
- Summary of classifiers
Name Size MaxOveralps Wildcard S1 9802 22 4 S2 9416 126 57 S3 9497 76 18 S4 9624 82 12 S5 7255 28 S6 99823 27 5 S7 87039 249 79 S8 99836 89 47 S9 99866 81 38 S10 99220 10
10 real classifiers 10 synthetic classifiers
Name Size MaxOveralps Wildcard R1 5233 49 18 R2 5626 63 32 R3 5874 98 48 R4 6339 47 16 R5 7356 38 5 R6 8063 64 35 R7 8475 31 4 R8 10054 1 R9 11574 334 271 R10 15181 177 143
Experimental setup (2)
- Block size of TCAMs
– Evaluated various sizes: 32, 64, 128, 256, 512 and 1024, respectively.
- Metric
– Power reductions
- Percentage of reductions on activated blocks
– Storage overhead of pre-classifier entries
- Percentage of pre-classifier size compared to the size of a whole
classifier
- Schemes
– SmartPC – Default TCAM (without SmartPC) – A naïve scheme named Naive-divide
Power reductions
With block size 128, the median and average power reductions are 91% and 88%, respectively
Real classifiers Synthetic classifiers
Percentage of power reductions vs. TCAM block size
Storage overhead
Real classifiers Synthetic classifiers
Small storage overhead, less than 4% for every classifier.
Fraction of storage overhead vs. TCAM block size
Comparison of SmartPC with Naïve-divide
Real classifiers Synthetic classifiers
SmartPC outperforms naïve-divide by more than 20% on average.
Percentage of power reductions with block size 128
Discussion
- Effect of prefix distribution and prefix length
- Power reduction on small classifiers
- Power reduction on IPv6 classifiers
Conclusion
Uses commodity TCAMs Is easy to implement Greatly reduces power consumptions of TCAMs, especially for larger classifiers
- Propose SmartPC, which:
Questions
Thanks
Backup slides
Prior work on Packet Classification
- Software-based approaches
– Examples: HiCuts, HyperCuts, EffiCuts, etc
- TCAM-based approaches
– High speed but suffer from some deficiencies such as high power consumption – Schemes for power efficiency:
- CoolCAMs (INFOCOM 2003): reduce power consumption of
TCAMs, but limited to IP forwarding
- Extended TCAMs (ICNP 2003): requires a new type of TCAM
that returns multiple matches
- Significant recent work within companies and are of
proprietary nature
Number of blocks activated vs. block size
R1 R9 S4 S10
Observations
- TCAMs
– The main component of power consumption in TCAMs
is proportional to the number of searched entries – Hardware supports turning on a small number of blocks – Hardware supports multiple searches simultaneously, such as Cisco’s TCAM4
- Classifiers
– For each incoming packet, often only a small number of matching rules in a classifier need to be searched
http://www.cisco.com/en/US/prod/collateral/switches/ps5718/ps4324/prod_white_paper0900aecd806dc821.html
Some stats
- A 2006 report reported:
– Data centers in U.S. today consume about 61 billion kWh (1.5%
- f total U.S. electricity consumption) for a total electricity cost of
about $4.5 billion – National energy consumption by servers and data centers could nearly double by 2011 to more than 100 billion kWh
- According to a Sigcomm CCR 2008 paper, network
consumes 10-20% of a data center's total power.
- With the growing sizes of classifiers, and the transition
from IPv4 to IPv6, the high power consumption of TCAMs increases both power supply cost and cooling cost
Report to Congress on Server and Data Center Energy Efficiency by U.S. Environmental Protection Agency. The cost of a cloud: research problems in data center networks in SIGCOMM CCR 2009
Properties of real classifiers
Maximum number of overlapping rules in the two-dimensional space Number of wildcard rules in the two-dimensional space
- Analyze more than 200 real classifiers ranging in size
from 3 to 15,181
Reduce the five-dimensional problem to two-dimensional!
Pre-process a classifier
- Given a mutlti-dimensional classifier C
containing a number of rules:
– The two-dimensional space is divided into non-
- verlapping rectangles. Each rectangle covers a
cluster of rules and represents an entry in the pre- classifier P for C – Shuffle rules in C such that each pre-classifier entry is associated with a TCAM block, named a specific block – If the number of rules that intercept with a pre- classifier entry exceeds TCAM block size, those extra rules are stored in TCAM blocks named general block(s)
2, 3, 4, 16 5, 6, 7, 8, 9 11, 12, 13, 14, 15
Dst_addr Src_addr
Given a classifier which contains 19 rules, block size = 5
1 2 3 4 5 7 8 9 6 10 13 11 14 12 15 19
P1 P2 P3
P1 P2 P3
16 17 18
1, 10, 17, 18, 19
Pre-process a classifier
2-dimensional pre-classifiers entries In TCAM block(s) 5-dimensional classifier rules in TCAM blocks
Specific blocks General blocks
Result Key Expect huge power reduction on large classifiers
Pre-classifier
TCAM