Optimization of Pattern Matching Algorithm for Memory Based - - PowerPoint PPT Presentation

optimization of pattern matching algorithm for memory
SMART_READER_LITE
LIVE PREVIEW

Optimization of Pattern Matching Algorithm for Memory Based - - PowerPoint PPT Presentation

Optimization of Pattern Matching Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing Hua University, Taiwan, R.O.C Outline Memory architecture for string matching Basic idea Novel


slide-1
SLIDE 1

Optimization of Pattern Matching Algorithm for Memory Based Architecture

Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing Hua University, Taiwan, R.O.C

slide-2
SLIDE 2

Outline

Memory architecture for string matching Basic idea Novel Algorithm for memory architecture Experimental results and conclusions

slide-3
SLIDE 3

Introduction

Network Intrusion Detection System is used

to detect network attacks by identifying attack patterns.

Software-only approaches can no longer

meet the high throughput of today’s networking

Hardware approaches for acceleration.

– Logic architecture – Memory architecture

slide-4
SLIDE 4

Advantage of Memory Architecture

Young H. Cho and William H. Mangione-Smith, “A Pattern Matching Co- processor for Network Security,” in Proc. 42nd IEEE/ACM Design Automation Conference, Anaheim, CA, June 13-17, 2005.

  • M. Aldwairi*, T. Conte, and P. Franzon. “Configurable String Matching

Hardware for Speeding up Intrusion Detection,” in Proc. ACM SIGARCH Computer Architecture News, 33(1):99–107, 2005.

  • S. Dharmapurikar and J. Lockwood. “Fast and Scalable Pattern Matching

for Content Filtering,” in Proc. Symposium on Architectures for Networking

  • and. Communications Systems (ANCS), Oct 2005.

The memory architecture has attracted

a lot of attention because of its easy re- configurability and scalability.

slide-5
SLIDE 5

Memory Architecture

“bcdf” “pcdg” 1 2 3 5 6 8

p c b c d f d g b b b b ~b & ~p p f f f p p p p b b b b

4 7

Current state Decoder

Input

NS1 NS2 …… NS256 MV <8> <8> …… <8> <16> 256:1 MUX 8

FSM Attack Patterns Memory

match vector

slide-6
SLIDE 6

Major Issue of Memory Architecture

Due to the increasing number of attacks,

the required memory increases tremendously

– The performance, cost, and power consumption are related to the memory size – Reducing the memory size has become imperative

slide-7
SLIDE 7

Outline

Memory architecture for string matching Basic idea Novel algorithm for memory architecture Experimental results and Conclusions

slide-8
SLIDE 8

Review of Aho-Corasick Algorithm

Aho-Corasick (AC) algorithm can reduce large

number of state transitions and memory size.

– Solid line represents valid transitions. – Dotted line represents failure transitions. – Introduce the failure transition to reduce the outgoing transitions. 2 3 4 6 7 8

p c b c d f d g

1 5

AC state machine

  • f “bcdf”and “pcdg”
slide-9
SLIDE 9

Observation

Many string patterns are similar because of

common sub-strings

The similarity does not lead to a small state

machine.

“bcdf” “pcdg”

2 3 4 6 7 8

p c b c d f d g

1 5

AC state machine

slide-10
SLIDE 10

Merge Similar States

The merg_FSM is a different machine

– smaller number of states and transitions. – smaller memory in memory architecture. 1 2 3 4 5 6 7 8

p c b c d f d g

1 26 37 4 5 8

p c b c d f g

merg_FSM

slide-11
SLIDE 11

Problem of merg_FSM

Directly merging similar states results in an

erroneous state machine.

1 2 3 4 5 6 7 8

p c b c d g d f

input stream = {p, c, d, f} 1 26 37 4 5 8

p c b c g d f

merg_FSM AC state machine

False Positive

slide-12
SLIDE 12

Outline

Memory architecture for string matching Basic Idea Novel Algorithm for memory architecture Experimental results and Conclusions

slide-13
SLIDE 13

State Traversal Mechanism

Store merg_FSM table in memory State traversal mechanism is used to

memorize the precedent state and differentiate merged states.

1 26 37 4 5 8

p c b g c d f

State traversal mechanism merg_FSM

2 3 4 6 7 8

p c b c d d g f

1 5

AC state machine ?2 or ?6

slide-14
SLIDE 14

New State Information

AC state machine stores match vector. New state machine stores

– PathVec stores path information. – IfFinal indicates whether the state is a final state. match vector

c d

1 2 3 4 5 6 7 8

p c b f d g

00 00 00 00 01 00 00 00 10 AC State Machine pathVec_ifFinal

c d

1 2 3 4 5 6 7 8

p c b f d g 01_0 11_0 01_0 01_0 01_1 10_0 10_0 10_0 10_1

New State Machine

slide-15
SLIDE 15

Pseudo-Equivalent States

Definition: Two states are pseudo-equivalent if

they have

– identical input transitions – identical failure transitions – identical ifFinal – but different next states.

c d

1 2 3 4 5 6 7 8

p c b f d g 01_0 01_0 01_0 01_1 10_0 10_0 10_0 10_1 11_0

slide-16
SLIDE 16

Merge Pseudo-Equivalent States

c d 1 2 3 4 5 6 7 8 p c b f d g

01_0 11_0 01_0 01_0 01_1 10_0 10_0 10_0 10_1 11_0

1 26 37 4 5 8

p c b c d f g

10_0 01_0 01_1 10_1 11_0 11_0 11_0 11_0

Pseudo-equivalent states are merged. PathVec and ifFinal are updated by a union

  • f merged states
slide-17
SLIDE 17

State Traversal Mechanism

PreReg traces the precedent pathVec in each

state.

1 26 37 4 5 8

p c b c d f g 11_0 10_0 01_0 11_0 11_0 01_1 10_1

input stream: {p, c, d, f}

Next state pathVec ifFinal

preReg 11 10 10 11 11 10 01 00

slide-18
SLIDE 18

Outline

Memory architecture for string matching Basic Idea Novel algorithm for memory architecture Experimental results and Conclusions

slide-19
SLIDE 19

Experiment I

Perform experiments on Snort rule sets. Compare our approach with the Aho-

Corasick algorithm .

A.V. Aho and M.J. Corasick. Efficient String Matching:

An Aid to Bibliographic Search. In Communications of the ACM 1975.

slide-20
SLIDE 20

Compare with Traditional AC

Tradition AC [24] Our algorithm

# of trans. # of states Memory (bytes) # of trans. # of states Memory (bytes) Memory Reduct.

Oracle 138 4,674 2,180 2,185 880,009 1,389 1,221 452,533 49% Sql 44 1,089 421 422 129,290 321 284 87,011 33% Backdoor 57 599 563 565 191,253 523 497 152,268 20% Web-iis 113 2,047 1,533 1,537 569,651 1,273 1,155 428,072 25% Web-php 115 2,455 1,670 1,675 620,797 1,295 1,142 423,254 32% Web-misc 310 4,711 3,576 3,587 1,444,664 3,031 2,734 1,101,119 24% Web-cgi 347 5,339 3,407 3,419 1,377,002 2,672 2,358 949,685 31% Total rules 1,595 20,921 17,472 17,522 8,745,668 14,704 13,381 6,248,927 29% Ratio 1 1 1 84% 76% 71% 29% Rule Sets

# of patterns # of char.

slide-21
SLIDE 21

Experiment II

Enhance the bit-split algorithm with our

method

– The results are compared with the original bit-split algorithm.

  • L. Tan and T. Sherwood. A high throughput

string matching architecture for intrusion detection and prevention. In ISCA’05.

slide-22
SLIDE 22

Compare with Traditional Bit-Split

Bit-split [8] Bit-split + Our algorithm

# of trans. # of states Memory (bytes) # of trans. # of states Memory (bytes) Memory Reduct.

Oracle 138 4,674 6,645 6,665 633,175 4,146 3,603 358,499 43% Sql 44 1,089 1,211 1,215 110,565 866 769 72,671 34% Backdoor 57 599 1,697 1,705 155,155 1,441 1,305 126,585 18% Web-iis 113 2,047 4,869 4,885 464,075 3,844 3,374 335,713 28% Web-php 115 2,455 4,991 5,011 476,045 3,871 3,345 332,828 30% Web-misc 310 4,711 10,959 11,003 1,067,291 8,861 7,816 797,232 25% Web-cgi 347 5,339 9,901 9,949 965,053 7,875 6,957 709,614 26% Total ruls 1,595 20,921 53,930 54,130 5,467,130 43,550 38,701 4,237,760 22% Ratio 1 1 1 81% 71% 78% 22% Rule Sets

# of patterns # of char.

slide-23
SLIDE 23

Conclusion

Provide a concept of merging pseudo-

equivalent states to reduce the number of states and transitions.

Propose a state traversal mechanism working

with the merg_FSM without false positive matching results.

Experimental results demonstrate a

significant reduction in memory requirement.

slide-24
SLIDE 24

Thank You!

slide-25
SLIDE 25

Backup

slide-26
SLIDE 26

Cycle Problem

Merging disorder sections of pseudo-

equivalent states creates cycle problem.

1 2 4 3 12 8 9 10 11 6

a b c d e f d e b c g

7

w

5

slide-27
SLIDE 27

Cycle Problem

For example, the input string “abcdebcdef” will

be mistaken as a match of the pattern “abcdef.”

1 2 4 3 12 5 6

a b c d e f g

7

w d b

slide-28
SLIDE 28

Construction of State Traversal Machine

Construction of the state traversal

machine consists of two steps

– Step1: Construct valid transitions, failure transitions, pathVec, and ifFinal function. – Step2: Merge the pseudo-equivalent states.

slide-29
SLIDE 29

Example

Consider three patterns “abcdef”,

“apcdeg”, “awcdeh”.

1 2 4 3 7 8 9 5 10 11 6 12 13 14 15 16

a b c d e f p w 001_1 001_0 010_0 010_0 010_0 100_0 100_0 100_0 100_0 010_1 100_1 c d e g c d e h 001_0 001_0 001_0 001_0 011_0 111_0 001_0 011_0 111_0 010_0

16 states

slide-30
SLIDE 30

100_0 d d

13

001_0

Merging Pseudo-equivalent States

1 2 4 3 7 8 9 10 11 6 12 14 15 16

a b c d e p c e w c e 001_1 001_0 001_0 001_0 111_0 111_0 010_0 010_0 010_0 010_0 100_0 100_0 100_0 010_1 100_1

5

f g h 111_0 merging the failure transitions performing the union on the pathVec of the

merged states

slide-31
SLIDE 31

001_0 100_0 111_0

Merging Pseudo-equivalent States

1 2 4 3 7 9 10 11 6 12 14 15 16

a b c d e p c d e w c d e 001_1 001_0 001_0 111_0 111_0 010_0 010_0 010_0 100_0 100_0 010_1 100_1 f g h

5

slide-32
SLIDE 32

100_1

15

100_0 111_0

Merging Pseudo-equivalent States

1 2 4 3 7 9 10 11 6 12 14 16

a b c d e f p c d e g w c d e h 001_1 001_0 001_0 001_0 111_0 111_0 010_0 010_0 010_0 100_0 100_0 010_1

5

111_0 111_0

10 states

slide-33
SLIDE 33

State Traversal Algorithm

Algorithm: State traversal pattern matching algorithm Input: A text string x=a1a2…an where each ai is an input symbol and a state traversal machine M with valid transition function g, failure transition function f, path function pathVec and final function ifFinal. Output: Locations at which keywords occur in x. Method: begin state←0 preReg← 1….1 //all bits are initiated to 1. for i←until n do begin preReg = preReg & pathVec(state)

while g(state, ai) == fail || preReg == 0 do begin state←f (state) preReg←1….1 end state← g(state, ai) if ifFinal(state) = 1 then begin print i print preReg end end end