Efficient Packet Classification for Intrusion Detection Using FPGA - - PDF document

efficient packet classification for intrusion detection
SMART_READER_LITE
LIVE PREVIEW

Efficient Packet Classification for Intrusion Detection Using FPGA - - PDF document

Efficient Packet Classification for Intrusion Detection Using FPGA Haoyu Song, John W. Lockwood Applied Research Lab : Reconfigurable Network Group Department of Computer Science and Engineering


slide-1
SLIDE 1

1

FPGA - 2/20/2005

Washington University in St. Louis 1

Efficient Packet Classification for Intrusion Detection Using FPGA

Haoyu Song, John W. Lockwood

Applied Research Lab : Reconfigurable Network Group Department of Computer Science and Engineering http://www.arl.wustl.edu/arl/projects/fpx/reconfig.htm

The research was funded by a grant from Global Velocity.

http://www.globalvelocity.com/

FPGA - 2/20/2005

Washington University in St. Louis 2

Network Intrusion Detection System (NIDS)

Device that detects network activity symptomatic of

an attack to network and computer systems.

Critical part of a unified threat management system Performs

Protocol Processing Packet Header Classification Content Inspection (string matching)

Traditionally Implemented as software on a PC, but

can be implemented as hardware in an FPGA

slide-2
SLIDE 2

2

FPGA - 2/20/2005

Washington University in St. Louis 3

Motivation & Challenges

FPGAs proven effective for content scanning & string matching

High throughput Great Flexibility

FPGAs can be effective for header processing as well

There is a need for efficient packet header classification

  • Needed to block Denial of Service (DoS) attacks
  • Integral part of Intrusion Detection System

Linear search is not practical for a large header rule set. Software-based system can’t keep up with high-speed networks Brute-force TCAM Implementations are inefficient on FPGAs

Desirable properties of header processing circuits

Avoid use of off-chip memory Prefer simple algorithm and architecture

FPGA - 2/20/2005

Washington University in St. Louis 4

Architecture of FPGA-based NIDS

Layered Internet Protocol Wrappers

Packet Header Alert Internet Packet

FPGA Hardware

Packet Payload

Source IP, Dest IP, Protocol Source Port Destination Port ID1 ID2 IDk

Packet Header Classification Bit Vector Focus of this Presentation Payload String Matching

(NFA, DFA Reg Ex, Bloom Filters, … )

slide-3
SLIDE 3

3

FPGA - 2/20/2005

Washington University in St. Louis 5

Packet Classification

1000:1300 146 TCP any any 6 < 110 any TCP any any 5 60000 49230 UDP any 192.158.0.0/16 4 443 any TCP 192.168.50.2 128.252.158.203 3 any 10101 TCP 192.168.0.0/16 any 2 2589 ≥ 1024 TCP 192.168.0.0/16 any 1 Destination Port Source Port Protocol Destination IP Source IP ID

  • Header Rule
  • IP fields are specified as prefix
  • Protocol filed is specified as exact value or wildcard
  • Port fields are specified as arbitrary range or exact value
  • Rules may share same specification for some fields
  • Rules may overlap
  • Rule Matching
  • A rule is matched by a packet if all the corresponding fields are matched in the

specified way

  • One packet can match multiple rules

FPGA - 2/20/2005

Washington University in St. Louis 6

Existing Packet Classification Schemes

Algorithmic solutions:

( e.g.: HyperCuts, Aggregated Bit Vector)

Poor worst case performance, or Excessive memory usage

Hardware (TCAM) solutions:

( e.g.: Extended TCAM, Parallel Packet Classification )

Lower density of entries High power consumption Cannot directly represent arbitrary ranges

  • Converting range to prefixes expands the rule set

A hybrid architecture is more efficient

slide-4
SLIDE 4

4

FPGA - 2/20/2005

Washington University in St. Louis 7

Characteristics of Snort NIDS Rule Set

Snort

Open source Network Intrusion Detection System

Characteristics of Version 2.3.0 (September, 2004)

2464 rules 274 unique header rules

Trends in growth

The number of rules increases 4 times in 4 years However, the number of unique header rules stays

relatively constant

FPGA - 2/20/2005

Washington University in St. Louis 8

Illustration of Bit Vector (BV) Algorithm

Given input packet with { S.IP, D.IP, Proto, S.Port, D.Port } =

{128.252.160.245, 192.168.50.2, TCP, 146, 1200}

ID Source Port Source IP Protocol Destination IP Destination Port 1 ≥ 1024 TCP 192.168.0.0/16 2589 2 10101 TCP 166.158.0.0/16 any 3 any TCP 192.168.50.2 443:444 4 49230 192.168.0.0/16 UDP any 60000 5 any TCP any <110 6 146 any any any any any TCP any 1000:1300

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Bit Vector

slide-5
SLIDE 5

5

FPGA - 2/20/2005

Washington University in St. Louis 9

Our solution – BV-TCAM

Utilizes Xilinx Coregen TCAM component

Unencoded output is exactly the Bit Vector we want

Avoids rule set expansion by excluding

source and destination port fields from TCAM

All other fields generate a unified Bit Vector from TCAM Uses Tree Bitmap to implement the Bit Vector algorithm that

classifies the port fields

Matches rules using results from both

TCAM & Tree Bitmap lookup engines

FPGA - 2/20/2005

Washington University in St. Louis 10

1000:1300 146 TCP any any 6 <110 any TCP any any 5 60000 49230 UDP any 192.158.0.0/16 4 443 any TCP 192.168.50.2 any 3 Any 10101 TCP 192.168.0.0/16 any 2 2589 ≥ 1024 TCP 192.168.0.0/16 any 1 Destination Port Source Port Protocol Destination IP Source IP ID

any 192.168.0.0/16 tcp any 192.158.0.0/16 any 192.168.50.2/32 any any tcp udp tcp

1 2 3 4 5 6

Bit Vector

Original Header Rule Table Compressed TCAM

slide-6
SLIDE 6

6

FPGA - 2/20/2005

Washington University in St. Louis 11

Compressed TCAM Implementation

Built with Xilinx TCAM core

Utilizes SRL16E components Performs lookup in one clock cycle Content can be updated in only a few clock cycles

For snort rule set, only 33 distinct entries,

each of 72 bits, need to be programmed.

Coregen TCAM core uses 1188 SRL16Es Only 3% of SRL16Es in XCV2000E

FPGA - 2/20/2005

Washington University in St. Louis 12

Store Port Field Bit Vectors in a Binary Trie

Port ranges expand to prefixes,

as they did with a TCAM

e.g. Expanding Port number ≥1024

would have required 6 TCAM Entries

  • 0000 01** **** **** (1024~2047)
  • 0000 1*** **** **** (2048~4095)
  • 0001 **** **** **** (4096~8191)
  • 001* **** **** **** (8192~16383)
  • 01** **** **** **** (16384~32767)
  • 1*** **** **** **** (32767~65535)

But, each prefix is just inserted in a prefix tree

  • Each valid prefix node now contains a Bit Vector

1 1 1 1 1 1

slide-7
SLIDE 7

7

FPGA - 2/20/2005

Washington University in St. Louis 13

Retrieving Bitmap Vector through Longest Prefix Matching

101010 101010 111010 001011 101110

1 1 1

001010 101010

1

101010 101010 101010

1

Source Port: 49230 →1100000001001110

(and 3,5) (and 3,5) (and 3,5) (and 3,5) (and 3,5) (and 3,5) (and 1, 3,5) (and 3,5) (and 1, 3,5)

Rule 1 .. Rule 6

FPGA - 2/20/2005

Washington University in St. Louis 14

Efficient Tree Bitmap Implementation

1 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 0 1 0 0 0 1 0 1 1 1 0 1 1 1 0

1 01 0100 01000000 1110000000001000 Internal Prefix Bitmap Extending Paths Bitmap Multi-bit Trie enables traversing multiple bits per memory access Tree Bitmap is an efficient hardware implementation of Multi-bit Trie

Performed with

  • ne BlockRAM

memory lookup

slide-8
SLIDE 8

8

FPGA - 2/20/2005

Washington University in St. Louis 15

Tree Bitmap Implementation Results (with 4-bit stride)

Statistics

56 distinct source port ranges → 87 distinct prefixes

  • 143 tree nodes for source port

124 distinct destination port ranges → 177 distinct prefixes

  • 400 tree nodes for destination port

Resource Usage

Data structure for both tries use < 100 Kbits of Block RAMs

  • ≤15% of total available memory

The control logic uses less than 2% of resources

Worst-case Lookup time

Compressed TCAM : 1 clock cycle Trie Lookup : 4 memory lookups in 8 clock cycles FPGA - 2/20/2005 Washington University in St. Louis 16

Bit Vector #2

1 2 3 n

Bit Vector #3 Bit Vector #2

1 2 3 n

Bit Vector #2

BV-TCAM Architecture

1 2 3 n TCAM 33x72

{SIP, DIP, Protocol} Tree Bitmap Bit Vector Stride 4 Decompress Source Port Destination Port IP Header Parse Pkt Multiple Matches

Block RAM Control Logic (to modify rules on the fly)

Tree Bitmap Bit Vector Stride 4

Block RAM SLR 16Es

Bit Vector #2

1 2 3 n

Bit Vector #1

slide-9
SLIDE 9

9

FPGA - 2/20/2005

Washington University in St. Louis 17

Results on Snort Rule Set

  • Synthesis results on Xilinx Virtex XCV2000E-8
  • Throughput
  • OC48 (2.5 Gigabit/second link speed)
  • Circuit runs at 100MHz clock rate with a 32-bit data width
  • Tree Bitmap algorithm requires 4 memory lookups to classify packet in worst case
  • Circuit uses 8 clock cycles to classify a packet in worst case
  • Lookup engine can perform 12.5M lookups/second
  • FPX platform can process 8M minimum-length packets per second
  • Resource Usage
  • < 10% of XCV2000E logic resources
  • < 20% of XCV2000E block RAMs
  • Projected speed up
  • Utilize multiple tree-bitmap lookup engines
  • Dual engines could interleave the accesses to BlockRAMs to perform 25M

lookups/second

  • Deploy circuit on Virtex 4 rather than VirtexE
  • Assuming 4x Speedup of ( Virtex4 / VirtexE ), achieve 100M Lookups/second

FPGA - 2/20/2005

Washington University in St. Louis 18

Conclusion

BV-TCAM architecture designed to provide

efficient packet classification in NIDS

High Throughput Performs matching with parallel circuits Utilizes both TCAM & Multibit Trie LPM Low Resource Consumption Reduces resource usage by using Compressed TCAM Minimizes memory usage by using

Tree Bitmap Algorithm for port field Bit Vector retrieval

BV-TCAM circuit evaluated with Snort ruleset

Occupies < 20% of a Virtex 2000E FPGA Classifies 8M packets/second on FPX platform Scales to classify 100M packets/second