Curing Regular Expressions Matching Algorithm s from I nsom nia, Am - - PowerPoint PPT Presentation

curing regular expressions matching algorithm s from i
SMART_READER_LITE
LIVE PREVIEW

Curing Regular Expressions Matching Algorithm s from I nsom nia, Am - - PowerPoint PPT Presentation

Curing Regular Expressions Matching Algorithm s from I nsom nia, Am nesia, and Acalculia Sailesh Kum ar Sailesh Kum ar Balakrishnan Chandrasekaran Balakrishnan Chandrasekaran Jonathan Turner Jonathan Turner George Varghese George Varghese


slide-1
SLIDE 1

Curing Regular Expressions Matching Algorithm s from I nsom nia, Am nesia, and Acalculia

Sailesh Kum ar Balakrishnan Chandrasekaran Jonathan Turner George Varghese Sailesh Kum ar Balakrishnan Chandrasekaran Jonathan Turner George Varghese

slide-2
SLIDE 2

2 - Sailesh Kumar - 1/ 9/ 2008

Regular Expressions in Security

Signature based NIDS is a popular device to enable

network security.

Attack patterns are specified as regular expressions.

» [ \t]*[Cc][Ww][Dd][ \t]+[~]root –Represents an attempt to change working directory to root.

Regular expression matching is expensive.

» Thousands of signatures. » High speed implementation requires GB memory. (often impractical.)

Drop Packet Alarm Attack Reset Connection

NIDS 1 2 4 3 Scan traffic

slide-3
SLIDE 3

3 - Sailesh Kumar - 1/ 9/ 2008

Traditional Implementation

NIDS implementation.

DFA: Fast, but requires large memory NFA: Compact, but slow D2FA: Trades-off memory-performance

1 / Memory Performance

DFA NFA D2FA, etc

Traditional implementation attempts to match traffic with the entire Virus signature

NIDS

Signatures: r1 = .*[gh]d[^g]*ge r2 = .*fag[^i]*i[^j]*j r3 = .*a[gh]i[^l]*[ae]c

Complex signatures lead to trade-off

slide-4
SLIDE 4

4 - Sailesh Kumar - 1/ 9/ 2008

Insomnia

NIDS implementation.

OBSERVATION: Typical traffic rarely match first few symbols within any virus signature.

NIDS

.*[gh]d[^g]*ge .*fag[^i]*i[^j]*j .*a[gh]i[^l]*[ae]c

Frequent match Rare match

NIDS keeps the entire signature active.

(Unvisited tail portions can be kept to sleep)

We refer to this problem as Insomnia

slide-5
SLIDE 5

5 - Sailesh Kumar - 1/ 9/ 2008

Cure to Insomnia

Solve Insomnia with a three-way trade-off.

1 / Memory Performance

Smaller matching signature prefixes => high performance low memory DFA NFA D2FA, etc

Memory Performance Traffic characteristics

In practice, frequently matching prefixes are very small in length

slide-6
SLIDE 6

6 - Sailesh Kumar - 1/ 9/ 2008

.*[gh] .*f .*a

Insomnia cure. If we select prefix s.t.

» Prefixes are small » Few packets match them – goto slow path

Cure to Insomnia

Fast path

Frequent match Rare match

Slow path

d[^g]*ge ag[^i]*i[^j]*j [gh]i[^l]*[ae]c

Only prefixes of signatures are matched in fast path Suffixes of the prefix matching signatures are matched in slow path Packets that don’t match prefix will not go to slow path Packets that match the prefix will go to the slow path

Fast prefix implementation (e.g. DFA) will require less memory, and will be feasible. Suffixes won’t require fast implementation will use less memory, and will be feasible.

High performance, Less memory

How to select the prefixes?

slide-7
SLIDE 7

7 - Sailesh Kumar - 1/ 9/ 2008

Prefix Generation

1 2 5 d g ^g g-h * 3 e 6 7 10 a g ^i f 8 j 9 i 11 12 15 g-h i a 13 c 14 a-e ^l ^j * *

s g a d j ... 1 1 1 1 2 2 2 3 1 3 4 3 0.2 0.1 0.1 0.1 0.01 0.02 0.01 0.001 0.002 1.0 1.0 1.0

CUT

Construct the NFA Run NFA for an input trace Count # times state is active Find probability

  • f state activity

MAKE A CUT (Limit the total slow path state probability)

slide-8
SLIDE 8

8 - Sailesh Kumar - 1/ 9/ 2008

.*[gh] .*f .*a

DoS Attacks

Fast path

Rare match

Slow path d[^g]*ge ag[^i]*i[^j]*j [gh]i[^l]*[ae]c

Attacker sends traffic that matches prefix “too often” Overloads the slow path Use per-flow anomaly counter Counts # of packets sent to the slow path.

Flows with high anomaly counter value are attack flows Send then to a low priority queue

well behaving flows will suffer

per-flow anomaly counter C k

Frequent match

slide-9
SLIDE 9

9 - Sailesh Kumar - 1/ 9/ 2008

Simulation of DoS Mitigation

5 10 15 20 25 1 26 51 76 101 126 151 176 201 226 251 Throughput, no DoS protection 1 2 3 4 5 1 26 51 76 101 126 151 176 201 226 251 Slow path load 5 10 15 20 25 1 26 51 76 101 126 151 176 201 226 251 Flow throughput. DoS protection

slow path's ε threshold No overloading Moderate overloading Extreme overloading time (seconds)

time (seconds) slow path load thruput with no DOS mitigation thruput with DOS mitigation

no overload moderate overload extreme overload good flows

50 well behaving flows 10 become anomalous 20 become anomalous

slide-10
SLIDE 10

10 - Sailesh Kumar - 1/ 9/ 2008

Results of Splitting Prefix/ Suffix

Regular expressions before split Prefixes after split ASCII length Number

  • f DFA

Total memory ASCII length Number

  • f DFA

Total memory Cisco 68 44.1 6 973 MB 19.8 1 152 MB Linux 70 67.2 4 30.7 MB 21.4 2 15.8 MB Bro 648 23.64 1 3.77 MB 16.1 1 1.23 MB Snort rule 1 22 59.4 5 114.6 MB 36.9 3 32.1 MB Snort rule 2 10 43.72 2 64.2 MB 16 1 6.5 MB Snort rule 3 19 30.72 N/A N/A 13.8 2 2.42 MB Source # of Rules

Slow path probability set to less than 0.01%

slide-11
SLIDE 11

11 - Sailesh Kumar - 1/ 9/ 2008

Second Contribution - HFA

NFAs are compact but slow

» Multiple active state

DFAs are fast representation

» State explosion is serious problem » State explosion mainly occurs due to the presence of closures

Three patterns

» 3 separate DFAs create 12 states

– 3 active states

» NFA has only 9 states

– Up to 6 active state

» A single DFA creates 20 states

– 1 active state (ab.* c) | (ac.* b) | (ba.* a) a 1 2 a c b c a,b b,c a,b c 3 1 of 3 DFAs – total 12 states 1 5 b a a,b,c NFA 6 4 8 a c b a 3 2 7 c a,b,c a,b,c b

slide-12
SLIDE 12

12 - Sailesh Kumar - 1/ 9/ 2008

State Explosion in DFA

State explosion occurs primarily because

» DFA has single active state » Don’t remember anything but the current active state (amnesia)

Requires a separate DFA state for every

situation that may occur during NFA parse

Input: abcd

(ab.* z) | (cd.* z) | (ef.* z) 1 6 z a * NFA 7 3 9 z z b f 5 2 8 d * * e 4 c

0, 2, 5 Input: abef 0, 2, 8 Input: cdef 0, 5, 8 Input: abcdef 0, 2, 5, 8 k closures => Number of DFA states is exponential in k Active states

slide-13
SLIDE 13

13 - Sailesh Kumar - 1/ 9/ 2008

HFA

Our solution is History based Finite Automata (HFA)

» Enable a single state of execution » Use a bit to represent the condition that a closure is reached » Certain transitions depends upon the bit values » Bits are also updated as HFA makes its transitions

(ab.* z) | (cd.* z) | (ef.* z) 1 6 z a * NFA 7 3 9 z z b f 5 2 8 d * * e 4 c

b1 b2 b3

Set if state 2 is reached Set if state 5 is reached Set if state 8 is reached HFA

slide-14
SLIDE 14

14 - Sailesh Kumar - 1/ 9/ 2008

Benefits of HFA

Single State of Execution – high performance Few bits are required (16, 32) – stored in registers Avoids state explosion – memory efficient (ab.* z) | (cd.* z) | (ef.* z) 1 6 z a * NFA 7 3 9 z z b f 5 2 8 d * * e 4 c

b1 b2 b3

Set if state 2 is reached Set if state 5 is reached Set if state 8 is reached HFA

slide-15
SLIDE 15

15 - Sailesh Kumar - 1/ 9/ 2008

Results

DFA H-FA % space reduction with H-FA H-FA parsing rate speedup # of automata total # of states # of automata # of flags Total #

  • f states

Cisco64 14 1 132784 1 6 3597 94.69

  • 3x
  • 2x

2x Cisco64 14 1 132784 1 13 1861 96.77 Cisco68 19 1 328664 1 17 2956 97.03 Snort 1 6 3 62589 1 5 583 97.40 Snort 2 1 1 12703 1 1 71 98.58 Snort 3 5 2 4737 1 5 116 93.48 Linux70 11 2 20662 1 9 1304 81.63 Source # of closures

slide-16
SLIDE 16

16 - Sailesh Kumar - 1/ 9/ 2008

Thank you and Questions???