Turn on more accessible mode
Skip to main content
Turn off more accessible mode
Department of Computer Science & Engineering
Sign In
|
Research
This Site: Research
All Sites
People
Advanced Search
Department of Computer Science & Engineering
About the Department
People
Research
Graduate Programs
Undergraduate Programs
Student Services
Wustl Engineering
About the Department
News
Calendar
Computer Science & Engineering Colloquia Series
Facts
Entrepreneurial Activities
Multidisciplinary Studies
Committees & Assignments
External Advisory Board
Student Organizations
Internal Documents
About WUSTL & St. Louis
Department Leadership
Faculty
Other Appointments
Department Staff
Collaborating Faculty
Doctoral Students
Masters Students
Alumni
Technical Reports
Theses & Dissertations
Research News
Collaborating Laboratories and Centers
Course Schedules & Descriptions
Colloquia Series
Doctoral Student Seminars
Admission Requirements
Financial Assistance & Fellowships
Admission Requirements
Course Schedules & Descriptions
Undergraduate Research Opportunities
Department of Computer Science & Engineering
>
Research
>
Technical Reports
>
Data Structures, Algorithms and Architectures for Efficient Regular Expression Evaluation
Technical Reports
: Data Structures, Algorithms and Architectures for Efficient Regular Expression Evaluation
Title
Data Structures, Algorithms and Architectures for Efficient Regular Expression Evaluation
File Date
9/11/2009
Abstract
Regular expression matching is a crucial task in several networking applications, where packet payloads need to be inspected at line rate over large sets of complex patterns. Modern, payload-scanning firewalls and intrusion detection systems are perhaps the most notable applications that rely on this mechanism. When targeting memory-based architectures, in which the pattern data structure is stored in off-chip memory such as DRAM, the design of efficient regular expression matching engines is subject to two basic resource requirements: memory size and memory bandwidth. When designing Field Programmable Gate Array implementations, in which the pattern data structure is stored in on-chip memories or logic gates, the focus is on minimizing logic cell utilization and allowing high operational clock frequency. Keeping these requirements feasible is a challenge as pattern sets grow in size and expressiveness.
In this work, we describe novel data structures and algorithms for high-speed regular expression evaluation and map the proposed schemes onto suitable parallel architectures. We aim to provide a comprehensive solution that is scalable with respect to both the size of the rule-set and the complexity of the individual rules. Our main contributions are the following. First, we propose an efficient and low complexity compression scheme for deterministic finite automata (DFAs), leading to a compression factor in excess of 98% across a set of representative data sets. Second, we identify and address problematic subpatterns, such as bounded and unbounded repetitions of wildcards and large character sets and back-references. Specifically, we propose a novel automaton with a limited memory footprint and a bounded memory bandwidth requirement that overcomes the limitations of pure DFA-based solutions. Third, we describe how to utilize our proposed techniques when targeting FPGA implementations. Finally, our tools for constructing and analyzing a range of alternative automata for regular-expression evaluation have been released as open-source, and are used actively by many researchers from around the world.
Authors
Michela Becchi
E-Mail
mbecchi@cse.wustl.edu
Notes
Web Page
http://www.arl.wustl.edu/~mbecchi/
Type of Report
PhD Dissertation
Approval Status
Approved
Attachments
MichelaBecchi_Dissertation.pdf
Content Type:
Item
Created at 9/11/2009 10:56 AM by Becchi, Michela
Last modified at 9/11/2009 11:04 AM by Hawkins, Madeline
Use this page to add attachments to an item.
Name