Curing Regular Expressions Matching Algorithm s from I nsom nia, Am nesia, and Acalculia
Sailesh Kum ar Balakrishnan Chandrasekaran Jonathan Turner George Varghese Sailesh Kum ar Balakrishnan Chandrasekaran Jonathan Turner George Varghese
Curing Regular Expressions Matching Algorithm s from I nsom nia, Am - - PowerPoint PPT Presentation
Curing Regular Expressions Matching Algorithm s from I nsom nia, Am nesia, and Acalculia Sailesh Kum ar Sailesh Kum ar Balakrishnan Chandrasekaran Balakrishnan Chandrasekaran Jonathan Turner Jonathan Turner George Varghese George Varghese
Sailesh Kum ar Balakrishnan Chandrasekaran Jonathan Turner George Varghese Sailesh Kum ar Balakrishnan Chandrasekaran Jonathan Turner George Varghese
2 - Sailesh Kumar - 1/ 9/ 2008
Signature based NIDS is a popular device to enable
Attack patterns are specified as regular expressions.
» [ \t]*[Cc][Ww][Dd][ \t]+[~]root –Represents an attempt to change working directory to root.
Regular expression matching is expensive.
» Thousands of signatures. » High speed implementation requires GB memory. (often impractical.)
Drop Packet Alarm Attack Reset Connection
NIDS 1 2 4 3 Scan traffic
3 - Sailesh Kumar - 1/ 9/ 2008
NIDS implementation.
DFA: Fast, but requires large memory NFA: Compact, but slow D2FA: Trades-off memory-performance
1 / Memory Performance
DFA NFA D2FA, etc
Traditional implementation attempts to match traffic with the entire Virus signature
NIDS
Signatures: r1 = .*[gh]d[^g]*ge r2 = .*fag[^i]*i[^j]*j r3 = .*a[gh]i[^l]*[ae]c
Complex signatures lead to trade-off
4 - Sailesh Kumar - 1/ 9/ 2008
NIDS implementation.
OBSERVATION: Typical traffic rarely match first few symbols within any virus signature.
NIDS
Frequent match Rare match
(Unvisited tail portions can be kept to sleep)
5 - Sailesh Kumar - 1/ 9/ 2008
Solve Insomnia with a three-way trade-off.
1 / Memory Performance
Smaller matching signature prefixes => high performance low memory DFA NFA D2FA, etc
Memory Performance Traffic characteristics
In practice, frequently matching prefixes are very small in length
6 - Sailesh Kumar - 1/ 9/ 2008
Insomnia cure. If we select prefix s.t.
» Prefixes are small » Few packets match them – goto slow path
Fast path
Frequent match Rare match
Slow path
Only prefixes of signatures are matched in fast path Suffixes of the prefix matching signatures are matched in slow path Packets that don’t match prefix will not go to slow path Packets that match the prefix will go to the slow path
Fast prefix implementation (e.g. DFA) will require less memory, and will be feasible. Suffixes won’t require fast implementation will use less memory, and will be feasible.
How to select the prefixes?
7 - Sailesh Kumar - 1/ 9/ 2008
1 2 5 d g ^g g-h * 3 e 6 7 10 a g ^i f 8 j 9 i 11 12 15 g-h i a 13 c 14 a-e ^l ^j * *
s g a d j ... 1 1 1 1 2 2 2 3 1 3 4 3 0.2 0.1 0.1 0.1 0.01 0.02 0.01 0.001 0.002 1.0 1.0 1.0
CUT
Construct the NFA Run NFA for an input trace Count # times state is active Find probability
MAKE A CUT (Limit the total slow path state probability)
8 - Sailesh Kumar - 1/ 9/ 2008
.*[gh] .*f .*a
Fast path
Rare match
Slow path d[^g]*ge ag[^i]*i[^j]*j [gh]i[^l]*[ae]c
Attacker sends traffic that matches prefix “too often” Overloads the slow path Use per-flow anomaly counter Counts # of packets sent to the slow path.
well behaving flows will suffer
per-flow anomaly counter C k
Frequent match
9 - Sailesh Kumar - 1/ 9/ 2008
5 10 15 20 25 1 26 51 76 101 126 151 176 201 226 251 Throughput, no DoS protection 1 2 3 4 5 1 26 51 76 101 126 151 176 201 226 251 Slow path load 5 10 15 20 25 1 26 51 76 101 126 151 176 201 226 251 Flow throughput. DoS protection
slow path's ε threshold No overloading Moderate overloading Extreme overloading time (seconds)
time (seconds) slow path load thruput with no DOS mitigation thruput with DOS mitigation
no overload moderate overload extreme overload good flows
50 well behaving flows 10 become anomalous 20 become anomalous
10 - Sailesh Kumar - 1/ 9/ 2008
Regular expressions before split Prefixes after split ASCII length Number
Total memory ASCII length Number
Total memory Cisco 68 44.1 6 973 MB 19.8 1 152 MB Linux 70 67.2 4 30.7 MB 21.4 2 15.8 MB Bro 648 23.64 1 3.77 MB 16.1 1 1.23 MB Snort rule 1 22 59.4 5 114.6 MB 36.9 3 32.1 MB Snort rule 2 10 43.72 2 64.2 MB 16 1 6.5 MB Snort rule 3 19 30.72 N/A N/A 13.8 2 2.42 MB Source # of Rules
Slow path probability set to less than 0.01%
11 - Sailesh Kumar - 1/ 9/ 2008
NFAs are compact but slow
» Multiple active state
DFAs are fast representation
» State explosion is serious problem » State explosion mainly occurs due to the presence of closures
Three patterns
» 3 separate DFAs create 12 states
– 3 active states
» NFA has only 9 states
– Up to 6 active state
» A single DFA creates 20 states
– 1 active state (ab.* c) | (ac.* b) | (ba.* a) a 1 2 a c b c a,b b,c a,b c 3 1 of 3 DFAs – total 12 states 1 5 b a a,b,c NFA 6 4 8 a c b a 3 2 7 c a,b,c a,b,c b
12 - Sailesh Kumar - 1/ 9/ 2008
State explosion occurs primarily because
» DFA has single active state » Don’t remember anything but the current active state (amnesia)
Requires a separate DFA state for every
Input: abcd
(ab.* z) | (cd.* z) | (ef.* z) 1 6 z a * NFA 7 3 9 z z b f 5 2 8 d * * e 4 c
0, 2, 5 Input: abef 0, 2, 8 Input: cdef 0, 5, 8 Input: abcdef 0, 2, 5, 8 k closures => Number of DFA states is exponential in k Active states
13 - Sailesh Kumar - 1/ 9/ 2008
Our solution is History based Finite Automata (HFA)
» Enable a single state of execution » Use a bit to represent the condition that a closure is reached » Certain transitions depends upon the bit values » Bits are also updated as HFA makes its transitions
(ab.* z) | (cd.* z) | (ef.* z) 1 6 z a * NFA 7 3 9 z z b f 5 2 8 d * * e 4 c
b1 b2 b3
Set if state 2 is reached Set if state 5 is reached Set if state 8 is reached HFA
14 - Sailesh Kumar - 1/ 9/ 2008
Single State of Execution – high performance Few bits are required (16, 32) – stored in registers Avoids state explosion – memory efficient (ab.* z) | (cd.* z) | (ef.* z) 1 6 z a * NFA 7 3 9 z z b f 5 2 8 d * * e 4 c
b1 b2 b3
Set if state 2 is reached Set if state 5 is reached Set if state 8 is reached HFA
15 - Sailesh Kumar - 1/ 9/ 2008
DFA H-FA % space reduction with H-FA H-FA parsing rate speedup # of automata total # of states # of automata # of flags Total #
Cisco64 14 1 132784 1 6 3597 94.69
2x Cisco64 14 1 132784 1 13 1861 96.77 Cisco68 19 1 328664 1 17 2956 97.03 Snort 1 6 3 62589 1 5 583 97.40 Snort 2 1 1 12703 1 1 71 98.58 Snort 3 5 2 4737 1 5 116 93.48 Linux70 11 2 20662 1 9 1304 81.63 Source # of closures
16 - Sailesh Kumar - 1/ 9/ 2008
Thank you and Questions???