Scalable String Matching on the Scalable String Matching on the - PowerPoint PPT Presentation

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the Cell BE Processor BE Processor Cell Cell BE Processor Daniele Scarpazza, Oreste Villa, Fabrizio Petrini Applied Computer Science Group Pacific Northwest National Laboratory fabrizio.petrini@pnl.gov Georgia Tech, Sony/Toshiba/IBM Workshop on Software and Applications for the Cell BE Processor Atlanta, GA, June 19 2007

Outline Outline Outline The problem � Network Intrusion Detection Systems (NIDS) are becoming an essential part of data centers � At the heart of a NIDS there is a string matching algorithm The Aho-Corasick algorithm � A Deterministic Finite Automaton (DFA) Multicore Processors � An interesting opportunity to accelerate keyword scanning � Most of existing work done on FPGAs/specialized processors Goals and challenges � Scalability of the dictionary and the network speed DFAs with very high speed � Two SPEs can handle a 10 Gbit/sec rate with a transition table of less than 200KB 2

The advent of teraflop-scale, many-core processors. Medieval Times Renaissance Period Industrial Age Threads 100 Arrays of Throughput Cores 10 Small Number Of Traditional 1 Cores SMT Year 2003 2005 2007 2009 2011 2013 Courtesy of Doug Carmean, Intel 3

Set Pattern Matching Problem Set Pattern Matching Problem Set Pattern Matching Problem Find patterns in text P ={P 1 , P 2 , ... P q }, in T Aho and Corasick proposed an interesting algorithm for multi-pattern string matching Uses a state machine Important problem in a number of fields � Text processing, biology, network security, etc. 4

Aho Corasick - - Example Example Aho Corasick Aho Corasick - Example P = {her, iris, he, is} T = “the iris for her” ϖ h i he ir is her iri iris 5

Aho Corasick - - Example Example Aho Corasick Aho Corasick - Example P = {her, iris, he, is} T = “the iris for her” ϖ �� h i �� he ir is her iri iris 6

Aho Corasick - - Example Example Aho Corasick Aho Corasick - Example P = {her, iris, he, is} T = “the iris for her” ϖ �� h i �� he ir is �� her iri iris 7

First Step: Keyword Tree First Step: Keyword Tree First Step: Keyword Tree 8

Second Step: Failed Transitions (Non- - Second Step: Failed Transitions (Non Second Step: Failed Transitions (Non- deterministic Finite Automaton NFA) deterministic Finite Automaton NFA) deterministic Finite Automaton NFA) 9

Extend Failed Transitions for Each Extend Failed Transitions for Each Extend Failed Transitions for Each Character Character Character 10

Build an Optimized Deterministic Finite Build an Optimized Deterministic Finite Build an Optimized Deterministic Finite Automaton (DFA) Automaton (DFA) Automaton (DFA) 11

Design Challenges: Speed vs vs Size of the Size of the Design Challenges: Speed Design Challenges: Speed vs Size of the Dictionary Dictionary Dictionary 12

Mapping the Aho Aho- -Corasick Corasick Algorithm on Algorithm on Mapping the Mapping the Aho-Corasick Algorithm on the Cell Processor: Data Streaming and the Cell Processor: Data Streaming and the Cell Processor: Data Streaming and SIMD parallelism SIMD parallelism SIMD parallelism PPE SPE1 SPE3 SPE5 SPE7 IOIF1 Data Arbiter BIF MIC SPE0 SPE2 SPE4 SPE6 IOIF0 13

Aho- -Corasick: A Multi Corasick: A Multi- -level Parallelization level Parallelization Aho Aho-Corasick: A Multi-level Parallelization General approach � Multithreaded parallelism within a Synergistic Processing Unit (SPU), using multiple segments/connections of the input stream � SIMD parallelism, pipeline parallelism (even/odd pipelines of the SPU) � An arsenal of techniques: loop unrolling, removing speculation, restricted pointers, etc. Using multiple SPUs to increase processing bandwidth/dictionary size Dynamic loading of dictionaries 14

Aggregate Main Memory Bandwidth: Memory Aggregate Main Memory Bandwidth: Memory Access Traffic Explicitly Orchestrated at User- - Access Traffic Explicitly Orchestrated at User Level Level 15

SIMD and Pipeline Parallelism SIMD and Pipeline Parallelism SIMD and Pipeline Parallelism State Transition Table 16 Interleaved input streams 16 input characters SIMD shl << 3 16 input symbols SIMD shr >> 1 16 offsets to the load load load load load load load load load load 16 loads load load load load load load transition table cells split 0xFFFFFFFE 0x00000001 + + + + Current state &&&&&&&&&&&&&&&& &&&&&&&&&&&&&&&& + + pointers for + 16 16 the 16 DFAs + 16 SISD + SISD SISD + ands ands + add + + + + Addresses + to the cells address address address address address address address address address address address address address address address address Next state pointers Final state flags containing the for the 16 DFA for the 16 DFA next state pointers 16

Local Storage Usage Local Storage Usage Local Storage Usage DFA DFA DFA state 256 k (total size of the local store) state state transition transition transition table table table 206 190 k 214 k (1520 states, k (1648 states, (1712 states, 32 input 32 input 32 input symbols) symbols) symbols) Input buffer 0 16 k 8 k Input buffer 0 4 k Input buffer 1 16 k Input buffer 1 8 k 4 k Code Code Code 34 k 34 k 34 k and Stack and Stack and Stack Case 1 Case 2 Case 3 17

Overlapping Computation with Overlapping Computation with Overlapping Computation with Communication Communication Communication Computation Data transfer Time Load buffer 0 (5.94 us) Load buffer 1 (5.94 us) Process Process buffer 0 buffer 0 (25.64 us) (25.64 us) Load buffer 0 (5.94 us) Process buffer 1 (25.64 us) Load buffer 1 (5.94 us) Process buffer 0 (25.64 us) 18

Schedule of a Dynamic State Transition Schedule of a Dynamic State Transition Schedule of a Dynamic State Transition Table (STT) Replacement Table (STT) Replacement Table (STT) Replacement Computation Data transfer Load input to buffer 0 (5.94 us) Time Load input to buffer 1 (5.94 us) Process buffer 0 Load next STT into STT 1 (match against STT 0) chunk 1/2 (48 kbyte) (25.64 us) (17.83 us) Load input to buffer 0 (5.94 us) Process buffer 1 Load next STT into STT 1 (match against STT 0) chunk 2/2 (47 kbyte) (25.64 us) (17.46 us) Load input to buffer 1 (5.94 us) Process buffer 0 Load next STT into STT 0 (match against STT 1) chunk 1/2 (48 kbyte) (25.64 us) (17.83 us) Load input to buffer 0 (5.94 us) Process buffer 1 Load next STT into STT 0 (match against STT 1) chunk 2/2 (47 kbyte) (25.64 us) (17.46 us) Load input to buffer 1 (5.94 us) Process buffer 0 Load next STT into STT 1 (match against STT 0) chunk 1/2 (48 kbyte) (25.64 us) (17.83 us) 19

Thoughput Provide by the STT replacement with Provide by the STT replacement with Thoughput Thoughput Provide by the STT replacement with a a variable number of tiles (1 to 8) a a variable number of tiles (1 to 8) a a variable number of tiles (1 to 8) 20

Conclusion Conclusion Conclusion Multi-core processors competitive with FPGAs and specialized network processors Multiple data streaming options to perform string matching Performance from 40 Gbits/sec to 5 Gbits/sec � With small dictionaries Future work includes � Addressing larger dictionaries � Compression of the STT Paper available at http://hpc.pnl.gov/people/fabrizio/papers/smtps07.pdf 21

Scalable String Matching on the Scalable String Matching on the - PowerPoint PPT Presentation

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the Cell BE Processor BE Processor Cell Cell BE Processor Daniele Scarpazza, Oreste Villa, Fabrizio Petrini Applied Computer Science Group Pacific

String Matching Inge Li Grtz CLRS 32 String Matching String matching problem: string

String Matching String matching problem: string T (text) and string P (pattern) over an

7.5 Bipartite Matching Matching Matching. Input: undirected graph G = (V, E). M E

The String Class Trace Code Constructing a String String s = "Java"; String

1 2 3+4 2 type Parser = String Tree type Parser = String ( Tree, String) type Parser =

String Matching II Algorithm : Design & Analysis [19] In the last class Simple String

Matching of Matrix Elements and Parton Showers CKKW matching in e + e collisions Lecture 2:

Global Shape Matching Section 3.3: Articulated Matching using Graph Cuts Global Shape Matching:

String Matching with Involutions Florin Manea Challenges in Combinatorics on Words April 2013

String Matching: Rabin-Karp Algorithm Greg Plaxton Theory in Programming Practice, Fall 2005

String Matching Algorithm : Design & Analysis [18] In the last class Optimal Binary

Chapter 32: String Matching Fall 2007 Simonas altenis simas@cs.aau.dk Modified by Pierre

Regular Expressions Simple matching and searching String: My name is Claus Regex: My name is

String Matching: Boyer-Moore Algorithm Greg Plaxton Theory in Programming Practice, Fall 2005

String Objectives Discuss string handling System.String class

Matching Bipartite Matching Input Given a (undirected) graph G = ( V , E ) Input Given a bipartite

Programming Your World Introduction an Breannd O Nuall ain o@uva.nl Amsterdam

Software/firmware activities @ MZ Working concurrently on support of existent hardware and

FPGAs: Why, When , and How to use them (with RFNoC ) Pt. 1 Martjn Braun, Nicolas Cuervo

CENG 4480 L09 Memory 2 Bei Yu Reference : Chapter 11 Memories CMOS VLSI DesignA

Analytic Left Inversion of SISO Lotka-Volterra Models Luis A. Duffaut Espinosa George Mason

Chapter 3 Chapter 3 Convolution Representation Convolution Representation CT Unit-Impulse

Towards RAN Slicing in 5G Navid Nikaein Communication System Department, EURECOM ITU Workshop,

Syntactic Criteria for Language-Based Noninterference Andrei Popescu, Johannes H olzl, Tobias

Scalable String Matching on the Scalable String Matching on the - PowerPoint PPT Presentation

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the Cell BE Processor BE Processor Cell Cell BE Processor Daniele Scarpazza, Oreste Villa, Fabrizio Petrini Applied Computer Science Group Pacific

String Matching Inge Li Grtz CLRS 32 String Matching String matching problem: string

String Matching String matching problem: string T (text) and string P (pattern) over an

7.5 Bipartite Matching Matching Matching. Input: undirected graph G = (V, E). M E

The String Class Trace Code Constructing a String String s = &quot;Java&quot;; String

1 2 3+4 2 type Parser = String Tree type Parser = String ( Tree, String) type Parser =

String Matching II Algorithm : Design &amp; Analysis [19] In the last class Simple String

Matching of Matrix Elements and Parton Showers CKKW matching in e + e collisions Lecture 2:

Global Shape Matching Section 3.3: Articulated Matching using Graph Cuts Global Shape Matching:

String Matching with Involutions Florin Manea Challenges in Combinatorics on Words April 2013

String Matching: Rabin-Karp Algorithm Greg Plaxton Theory in Programming Practice, Fall 2005

String Matching Algorithm : Design &amp; Analysis [18] In the last class Optimal Binary

Chapter 32: String Matching Fall 2007 Simonas altenis simas@cs.aau.dk Modified by Pierre

Regular Expressions Simple matching and searching String: My name is Claus Regex: My name is

String Matching: Boyer-Moore Algorithm Greg Plaxton Theory in Programming Practice, Fall 2005

String Objectives Discuss string handling System.String class

Matching Bipartite Matching Input Given a (undirected) graph G = ( V , E ) Input Given a bipartite

Programming Your World Introduction an Breannd O Nuall ain o@uva.nl Amsterdam

Software/firmware activities @ MZ Working concurrently on support of existent hardware and

FPGAs: Why, When , and How to use them (with RFNoC ) Pt. 1 Martjn Braun, Nicolas Cuervo

CENG 4480 L09 Memory 2 Bei Yu Reference : Chapter 11 Memories CMOS VLSI DesignA

Analytic Left Inversion of SISO Lotka-Volterra Models Luis A. Duffaut Espinosa George Mason

Chapter 3 Chapter 3 Convolution Representation Convolution Representation CT Unit-Impulse

Towards RAN Slicing in 5G Navid Nikaein Communication System Department, EURECOM ITU Workshop,

Syntactic Criteria for Language-Based Noninterference Andrei Popescu, Johannes H olzl, Tobias

The String Class Trace Code Constructing a String String s = "Java"; String

String Matching II Algorithm : Design & Analysis [19] In the last class Simple String

String Matching Algorithm : Design & Analysis [18] In the last class Optimal Binary