ReCPU: a Parallel and Pipelined Architecture for Regular Expression - PDF document

ReCPU: a Parallel and Pipelined Architecture for Regular Expression Matching Marco Paolieri, Ivano Bonesana Marco D. Santambrogio ALaRI, Faculty of Informatics Dipartimento di Elettronica e Informazione University of Lugano, Lugano, Switzerland Politecnico di Milano, Milano, Italy { paolierm, bonesani } @alari.ch marco.santambrogio@polimi.it ABSTRACT with these limitations because we are able to store regular expressions using O ( n ) memory locations. We do not Text pattern matching is one of the main and most compu- require any additional time to start to process the regular tation intensive parts of systems such as Network Intrusion expressions (from now on RE). In [2] an architecture that Detection Systems and DNA Sequencing Matching. Soft- allows extracting and sharing common sub-regular expres- ware solutions to this are available but often they do not sions, in order to reduce the area of the circuit, is presented. satisfy the requirements in terms of performance. This pa- It is necessary to re-generate the HDL description to change per presents a new hardware approach for regular expression the regular expression. It is clear that this approach gener- matching: ReCPU. The proposed solution is a parallel and ates an implementation dependent from the pattern. In [3] pipelined architecture able to deal with the common regular a software that translates a RE into a circuit description has expression semantics. This implementation based on sev- been developed. A Non-deterministic Finite Automaton has eral parallel units achieves a throughput of more than one been utilized to dynamically create efficient circuits for pat- character per clock cycle (maximum performance of current tern matching (that have been specified with a standard rule proposed solution) requiring just O ( n ) memory locations language). (where n is the length of the regular expression). Perfor- mance has been evaluated synthesizing the VHDL descrip- The work proposed in [4] focuses on REs pattern matching engines implemented with reconfigurable hardware. A tion. Area and time constraints have been analyzed. Exper- Non-deterministic Finite Automaton based implementation imental results are obtained simulating the architecture. is used, and a tool for automatic generation of the VHDL description has been developed. All these approaches - [2], 1. INTRODUCTION [3], [4] - require a new generation of the HDL description Searching for a set of strings that match a given pattern is whenever a new regular expression needs to be processed. a well known computation-intensive task, exploited in sev- In our solution we just require to update the instruction eral different application fields. Software solutions cannot memory with the new RE. In [5] a parallel FPGA implemen- always meet the requirements in terms of speed. Nowadays tation is described: multiple comparators allow to increase there is an increasing need of high performance computing the throughput for parallel matching of multiple patterns. - as in the case of biological sciences. Matching a DNA pat- In [6] a DNA sequence matching processor using FPGA tern among millions of sequences is a very common and com- and Java interface is presented. Parallel comparators are putationally expensive task in the Human Genome Project. used for the pattern matching. They do not implement the In Network Intrusion Detection Systems - where regular ex- regular expression semantics (i.e. complex operators) but pressions are used to identify network attack patterns - soft- just simple text search based on exact string matching. ware solutions are not acceptable because they would slow At the best of our knowledge this paper presents a dif- down the entire system. Such applications require a different ferent approach to the pattern matching problem: REs are approach. considered the programming language for a dedicated CPU. To move towards a full hardware implementation - over- We do not build either Deterministic or Non-deterministic coming the performance achievable with software - it is rea- Finite Automaton of the RE, hence not requiring additional sonable for these application domains. setup time as in [1]. ReCPU - the proposed architecture - is Several research groups have been studying hardware ar- a processor able to fetch an RE from the instruction memory chitectures for regular expressions matching: mostly based and perform the matching with the text stored in the data on Non-deterministic Finite Automaton (NFA) as described memory. The architecture is optimized to execute computa- in [1] and [2]. tions in a parallel and pipelined way. This approach involves In [1] an FPGA implementation is proposed. It requires several advantages: on average it compares more than one O ( n 2 ) memory space and processes a text character in O (1) character per clock cycle as well as it requires less memory time (one clock cycle). The architecture is based on hard- occupation: for a given RE of size n the memory required is ware implementation of Non-deterministic Finite Automa- just O ( n ). In our solution it is easily possible to change the ton (NFA); additional time and space are necessary to build pattern at run-time just updating the content of the instruc- the NFA structure starting from the given regular expres- tion memory without modifying the underlaying hardware. sion. The time required is not constant, it can be linear in Considering the CPU-like approach a small compiler is nec- best cases and exponential in worst ones. We do not face essary to obtain the machine code from the given RE (i.e.

ReCPU: a Parallel and Pipelined Architecture for Regular Expression - PDF document

ReCPU: a Parallel and Pipelined Architecture for Regular Expression Matching Marco Paolieri, Ivano Bonesana Marco D. Santambrogio ALaRI, Faculty of Informatics Dipartimento di Elettronica e Informazione University of Lugano, Lugano,

Review: FP Pipeline Model 4-stage fully pipelined adder, Non-pipelined multiplier and divider A1

DLX Pipeline 2-stage fully pipelined Adder 4-stage fully pipelined Multiplier 5-cycle

Chapter 6: Designing a Pipelined CPU What are our resources? 1 washer, 1 dryer, 1 folder

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Regular Expressions A regular expression describes a language using three operations. Regular

Objectives You should be able to ... Regular Languages Use the syntax of regular expressions

Chapter 10: Pipelined and Parallel Recursive and Adaptive Filters Keshab K. Parhi Outline

LECTURE 9 Pipeline Hazards PIPELINED DATAPATH AND CONTROL In the previous lecture, we

CSEE 3827: Fundamentals of Computer Systems, Spring 2011 9. Pipelined MIPS Processor Prof.

Edge-regular graphs and regular cliques Gary Greaves Nanyang Technological University, Singapore

Regular Expressions = Regular Languages Mark Greenstreet, CpSc 421, Term 1, 2008/09 17

Regular a regular expression I Example 1.68 Consider the following DFA b a 1 2 a b a

A Theory of Regular Queries Moshe Y. Vardi Rice University Theory of Regular Languages, I

Theory of Computer Science C3. Regular Languages: Regular Expressions, Pumping Lemma Malte

Cap 1 Introduction Introduction What is Parallel Architecture? Why Parallel Architecture?

Introduction Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures Parallel

Bitcoin is a new technology that can be used as a payment network and digital currency The image

Functional Reactive Programming Brandon Siegel Senior Engineer, Mobile Defense Topics

The Impact of Entrepreneurship on Higher Education Jorge Haddock President University of

Safe Autonomous Flight Environment for the Notional First/Last 50 Feet (SAFE50) Project

NEW JERSEY INTELLECTUAL PROPERTY LAW ASSOCIATION 25TH ANNUAL PHARMACEUTICAL/CHEMICAL UPDATE THE

Locally repairable codes on multiple scales Ragnar Freij-Hollanti Aalto University, Finland

Understanding Computation with Computation Jukka Suomela Aalto University, Finland Joint work

A Hierarchical Coordination Language for Interacting Real-Time Tasks Arkadeb Ghosal, Thomas A.