2 related work and background
play

2 Related Work and Background 2.2 Aho-Corasick Algorithm 2.1 - PDF document

Multi-Core Architecture on FPGA for Large Dictionary String Matching Qingbo Wang, Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, CA 90089-2562 qingbow, prasanna@usc.edu


  1. Multi-Core Architecture on FPGA for Large Dictionary String Matching ∗ Qingbo Wang, Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, CA 90089-2562 qingbow, prasanna@usc.edu Abstract network traffic, high performance algorithms are required to prevent an IDS from becoming a network bottleneck. FPGAs have been attractive for high performance imple- FPGA has long been considered an attractive platform mentations of string matching due to their high I/O band- for high performance implementations of string matching. width and computational parallelism. Application specific However, as the size of pattern dictionaries continues to optimizations for string matching algorithms have been pro- grow, such large dictionaries can be stored in external posed for FPGA-based designs [18]. They typically use a DRAM only. The increased memory latency and limited small dictionary, on the order of a few thousand patterns bandwidth pose new challenges to FPGA-based designs, (e.g., see [3, 4]). Thus the state transition table ( STT ) gen- and the lack of spatial and temporal locality in data access erated from a Deterministic Finite Automaton (DFA) repre- also leads to low utilization of memory bandwidth. In this sentation of the pattern dictionary, or the pattern signatures paper, we propose a multi-core architecture on FPGA to ad- themselves, can be stored in the on-chip memory or in the dress these challenges. We adopt the popular Aho-Corasick logic of FPGAs. (AC-opt) algorithm for our string matching engine. Utiliz- However, the size of dictionaries has increased greatly. ing the data access feature in this algorithm, we design a A dictionary can have 10,000 patterns or more [14,15] now, specialized BRAM buffer for the cores to exploit a data re- resulting in an STT table tens of megabytes in size. Such use existing in such applications. Several design optimiza- large tables can be stored only in external memory and in- tion techniques are utilized to realize a simple design with cur long access latency. Since every character searched re- high clock rate for the string matching engine. An imple- quires a memory reference, this latency increase degrades mentation of a 2-core system with one shared BRAM buffer the string matching performance. The problem is worsened on a Virtex-5 LX155 achieves up to 3.2 Gbps throughput on a 64 MB state transition table stored in DRAM. Perfor- by the fact that string matching presents little memory ac- mance of systems with more cores is also evaluated for this cess locality and that access to the STT is irregular. architecture, and a throughput of over 5.5 Gbps can be ob- In this paper, we propose a multi-core architecture on tained for some application scenarios. FPGA for large dictionary string matching. We use the Aho-Corasick algorithm (AC-opt) for design verification, but the architecture can be applied to any such algorithms 1 Introduction that employ a DFA stored in DRAM for pattern match- ing [16]. Our study shows, using AC-opt algorithm, that a String matching looks for all occurrences of a pattern small number of frequently visited states exist in the process dictionary, in a steam of input data. It is the key operation of string matching, and the majority of memory references in search engines, and is a core function of network mon- during string matching go to these “hot” states. When we itoring, intrusion detection systems (IDS), virus scanners, allocate these states on FPGA to enable on-chip access to and spam/content filters [3, 4, 15]. For example, the open- them, not only can the traffic to external memory be signif- source IDS Snort [15] has thousands of content-based rules, icantly reduced, but the throughput for the string matching many of which require string matching against entire net- engine is also improved due to fast on-chip access. Our ma- work packets, i.e. deep packet inspection. To support heavy jor contributions are: • To the best of our knowledge, our architecture is the ∗ Supported by the United States National Science Foundation under first multi-core architecture on FPGA for large dic- grant No. CCR-0702784. Equipment grant from Xilinx Inc. is gratefully tionary string matching to address the challenge of acknowledged.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend