Skip to main content

Research

Go Search
Department of Computer Science & Engineering
About the Department
People
Research
Graduate Programs
Undergraduate Programs
Student Services
Wustl Engineering
  
Department of Computer Science & Engineering > Research > Technical Reports > Data Structures, Algorithms and Architectures for Efficient Regular Expression Evaluation  

Technical Reports: Data Structures, Algorithms and Architectures for Efficient Regular Expression Evaluation

Title

Data Structures, Algorithms and Architectures for Efficient Regular Expression Evaluation 

File Date

9/11/2009 

Abstract

Regular expression matching is a crucial task in several networking applications, where packet payloads need to be inspected at line rate over large sets of complex patterns. Modern, payload-scanning firewalls and intrusion detection systems are perhaps the most notable applications that rely on this mechanism. When targeting memory-based architectures, in which the pattern data structure is stored in off-chip memory such as DRAM, the design of efficient regular expression matching engines is subject to two basic resource requirements: memory size and memory bandwidth. When designing Field Programmable Gate Array implementations, in which the pattern data structure is stored in on-chip memories or logic gates, the focus is on minimizing logic cell utilization and allowing high operational clock frequency. Keeping these requirements feasible is a challenge as pattern sets grow in size and expressiveness.

In this work, we describe novel data structures and algorithms for high-speed regular expression evaluation and map the proposed schemes onto suitable parallel architectures. We aim to provide a comprehensive solution that is scalable with respect to both the size of the rule-set and the complexity of the individual rules. Our main contributions are the following. First, we propose an efficient and low complexity compression scheme for deterministic finite automata (DFAs), leading to a compression factor in excess of 98% across a set of representative data sets. Second, we identify and address problematic subpatterns, such as bounded and unbounded repetitions of wildcards and large character sets and back-references. Specifically, we propose a novel automaton with a limited memory footprint and a bounded memory bandwidth requirement that overcomes the limitations of pure DFA-based solutions. Third, we describe how to utilize our proposed techniques when targeting FPGA implementations. Finally, our tools for constructing and analyzing a range of alternative automata for regular-expression evaluation have been released as open-source, and are used actively by many researchers from around the world.

Authors

Michela Becchi

E-Mail

mbecchi@cse.wustl.edu 

Notes

 

Web Page

http://www.arl.wustl.edu/~mbecchi/ 

Type of Report

PhD Dissertation 
Approval Status Approved 
 
Attachments
MichelaBecchi_Dissertation.pdf    
Content Type: Item
Created at 9/11/2009 10:56 AM  by Becchi, Michela 
Last modified at 9/11/2009 11:04 AM  by Hawkins, Madeline