StriD 2 FA: Scalable Regular Expression Matching for Deep Packet - PDF document

StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Xiaofei Wang † Junchen Jiang ‡ Yi Tang ‡ Yi Wang ‡ Bin Liu ‡ Xiaojun Wang † † School of Electronic Engineering, Dublin City University, Dublin, Ireland ‡ Department of Computer Science and Technology, Tsinghua University, Beijing, China The benefits of LBM are not only limited to increase matching Abstract —Deep packet inspection (DPI) has become one of the speed. As to memory consumption, StriD 2 FA also costs less key components of a Network Intrusion Detection System (NIDS) and it compares packet content against a set of rules written memory than DFA-based accelerating algorithms, for two rea- in regular expression. The need to keep up with ever-increasing sons: 1) it has less states since regexes are stored more compactly line speed has forced NIDS designers to move to hardware-based in StriD 2 FA (Section IV), and 2) the upper bound of SL are implementation where the memory resources are limited. easily controlled (Subsection III-A) so that each state has less In this paper, we present LBM, a novel accelerating scheme for regular expression matching which converts the original byte fan-out. Moreover, LBM can be expediently applied on existing stream into much shorter integer stream and then matches it with hardware/software platform, as StriD 2 FA share the same I/O a variant of DFA, called Stride-DFA(StriD 2 FA). In the instance of interfaces and logic structure with traditional DFA built directly LBM that we realize, a speedup of 10-15 is achievable while the from the regex set. required memory size is much less than that in the traditional DFA. Index Terms —Regular Expression Matching, DPI, DFA LBM also leads to two key challenges. First, to preserve the expressiveness of regex,any regex should be able to transform to StriD 2 FA. This is achieved by a graph algorithm that transform I. I NTRODUCTION any DFA to a StriD 2 FA (Section IV). Second, since the SL DPI technologies have been increasingly deployed in NIDS stream is a compressed representation of the original stream, only to detect attacks or viruses. To this end, state-of-the-art systems, part of the original stream is matched by StriD 2 FA, causing false including Snort [1], ClamAV [2] and security applications from positive (but no false negative). An algorithm is proposed that Cisco Systems [3], compare packet content to a set of rules. ensures the false positive rate is at an acceptable low level (detail Rules written in strings are initially popular, but have limited in Section V). A verification phase is used for accurate matching expressiveness. To support increasingly complex services, regu- if a possible match is found by StriD 2 FA. Since the majority of lar expression (regex) has been used to replace string by these the Internet traffic is not malicious so that it is possible to get systems due to its higher expressiveness and flexibility. The need quite high throughput if the probability of having to execute to keep up with ever-increasing line speed has forced NIDS accurate matching is low [4]. designers to move to hardware or high-speed memory where In particular, the contributions are summarized as follows: memory resources are limited. Thus, to design regex matching • Introduce the concept of LBM, a novel accelerating scheme that achieves both time and space efficiency is a significant for regex matching which converts the original byte stream challenge. into much shorter integer stream and then matches it with A novel length-based matching (LBM) is presented for ac- a variant of DFA, called StriD 2 FA. celerating regex matching. Like traditional methods, LBM has a • Give the formal construction of StriD 2 FA that transforms DFA-like matcher called Stride-DFA (StriD 2 FA) . However, LBM any set of regex to a StriD 2 FA. differs from traditional methods in two key ways: • Describe the method to extract SL stream from input stream • In LBM, a packet as a byte stream is first converted into a so that false positive rate can be reduced to an relative low much shorter stride-length (SL) stream ( i.e. , integer stream) level. before sending to StriD 2 FA. Therefore, the shorter the SL • Realize an general instance of LBM. It is demonstrated that stream is, the higher the speedup can be achieved (in our this instance achieves both space and time efficiency and system, 10 to 15 times speedup is achievable). can be expediently migrated to existing platforms. 10 to • Since it is the SL stream that StriD 2 FA receives (rather than 15 times speedup is achievable while the memory cost is original byte string as in DFA), StriD 2 FA is not directly smaller than traditional DFA. built from regex, but is built according to different kinds of SL streams. Therefore, the fundamental difference between The rest of the chapter is organized as follows. In Section II StriD 2 FA and DFA is that in DFA a transition records a the previous work related to pattern matching is discussed. byte while in StriD 2 FA it records a length ( i.e. , integer). Section III presents the overall structure of LBM and how it works with an example. Section IV gives the formal construction of a StriD 2 FA and false positive will be addressed in Section V. This paper is supported by NSFC (60625201, 60873250, 61073171), 973 project (2007CB310702), Tsinghua University Initiative Scientific Research Section VI reports and analyzes the performance of LBM and Program, the Specialized Research Fund for the Doctoral Program of Higher StriD 2 FA. The paper is finally concluded by Section VII. Education of China and Dublin City University Research Collaboration Program.

StriD 2 FA: Scalable Regular Expression Matching for Deep Packet - PDF document

StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Xiaofei Wang Junchen Jiang Yi Tang Yi Wang Bin Liu Xiaojun Wang School of Electronic Engineering, Dublin City University, Dublin, Ireland

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

7.5 Bipartite Matching Matching Matching. Input: undirected graph G = (V, E). M E

Regular a regular expression I Example 1.68 Consider the following DFA b a 1 2 a b a

Regular Expressions A regular expression describes a language using three operations. Regular

Lec 03. Regular expression, Pumping lemma Eunjung Kim F ORMAL DEFINITION OF R EGULAR EXPRESSION

Leftmost Longest Regular Expression Matching in Reconfigurable Logic Kubilay Atasu IBM Research

Gene Expression Data Introduction to gene expression data Expression data storage concept An

Matching of Matrix Elements and Parton Showers CKKW matching in e + e collisions Lecture 2:

Global Shape Matching Section 3.3: Articulated Matching using Graph Cuts Global Shape Matching:

Towards direct models of classical logic Locali meeting (Beijing, 4-6/11/2013) Pierre-Louis

Regular Expressions CS 2110 What is a regular expression? A special string for describing a

The Expression Problem and Lenses Lambdajam 2016 Tony Morris The Expression Problem A new name

Objectives You should be able to ... Regular Languages Use the syntax of regular expressions

Matching Bipartite Matching Input Given a (undirected) graph G = ( V , E ) Input Given a bipartite

Regular Expression More conventionally called a pattern An expression that

Scalable XQuery Type Matching Jens Teubner IBM T. J. Watson Research Center teubner@us.ibm.com

QS/EC Competencies in Europe The position of the QS/EC CEEC study October 2017 1 Part 2

Mli lisse sse A circle of spectators deputes the space for a contemporary sabbath, a

Photogrammetry & Virtual Reality Transporting Real Sites into VR Daniel Sproll @left_big_toe

BATHURST RESOURCES AGM presentation November 2018 AGENDA FY18 overview FY19 progress

VENLO, 16 MAY 2017 DISCLAIMER. 2 THIS PRESENTATION AND ITS CONTENTS ARE NOT FOR RELEASE,

Welcome Jordan Dollars for Scholars information Why Scholarships? Scholarships are free $$$

i DC SCHOLARS PUBLIC CHARTER SCHOOL April 6, 2019 Wendell Felder Valecia Wilson Senior

PROVIDING A PLAN FORWARD PATH FORWARD INVESTING IN THE FUTURE: Pensacola Bay Center has

StriD 2 FA: Scalable Regular Expression Matching for Deep Packet - PDF document

StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Xiaofei Wang Junchen Jiang Yi Tang Yi Wang Bin Liu Xiaojun Wang School of Electronic Engineering, Dublin City University, Dublin, Ireland

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

7.5 Bipartite Matching Matching Matching. Input: undirected graph G = (V, E). M E

Regular a regular expression I Example 1.68 Consider the following DFA b a 1 2 a b a

Regular Expressions A regular expression describes a language using three operations. Regular

Lec 03. Regular expression, Pumping lemma Eunjung Kim F ORMAL DEFINITION OF R EGULAR EXPRESSION

Leftmost Longest Regular Expression Matching in Reconfigurable Logic Kubilay Atasu IBM Research

Gene Expression Data Introduction to gene expression data Expression data storage concept An

Matching of Matrix Elements and Parton Showers CKKW matching in e + e collisions Lecture 2:

Global Shape Matching Section 3.3: Articulated Matching using Graph Cuts Global Shape Matching:

Towards direct models of classical logic Locali meeting (Beijing, 4-6/11/2013) Pierre-Louis

Regular Expressions CS 2110 What is a regular expression? A special string for describing a

The Expression Problem and Lenses Lambdajam 2016 Tony Morris The Expression Problem A new name

Objectives You should be able to ... Regular Languages Use the syntax of regular expressions

Matching Bipartite Matching Input Given a (undirected) graph G = ( V , E ) Input Given a bipartite

Regular Expression More conventionally called a pattern An expression that

Scalable XQuery Type Matching Jens Teubner IBM T. J. Watson Research Center teubner@us.ibm.com

QS/EC Competencies in Europe The position of the QS/EC CEEC study October 2017 1 Part 2

Mli lisse sse A circle of spectators deputes the space for a contemporary sabbath, a

Photogrammetry &amp; Virtual Reality Transporting Real Sites into VR Daniel Sproll @left_big_toe

BATHURST RESOURCES AGM presentation November 2018 AGENDA FY18 overview FY19 progress

VENLO, 16 MAY 2017 DISCLAIMER. 2 THIS PRESENTATION AND ITS CONTENTS ARE NOT FOR RELEASE,

Welcome Jordan Dollars for Scholars information Why Scholarships? Scholarships are free $$$

i DC SCHOLARS PUBLIC CHARTER SCHOOL April 6, 2019 Wendell Felder Valecia Wilson Senior

PROVIDING A PLAN FORWARD PATH FORWARD INVESTING IN THE FUTURE: Pensacola Bay Center has

Photogrammetry & Virtual Reality Transporting Real Sites into VR Daniel Sproll @left_big_toe