REmatch: a novel regex engine for finding all matches
School authors:
External authors:
- Nicolas Van Sint Jan ( Pontificia Universidad Catolica de Chile , IMFD Chile )
- Domagoj Vrgoc ( Pontificia Universidad Catolica de Chile , IMFD Chile )
Abstract:
In this paper, we present the RE match system for information extraction. REmatch is based on a recently proposed enumeration algorithm for evaluating regular expressions with capture variables supporting the all-match semantics. It tells a story of what it takes to make a theoretically optimal algorithm work in practice. As we show here, a naive implementation of the original algorithm would have a hard time dealing with realistic workloads. We thus develop a new algorithm and a series of optimizations that make REmatch as fast or faster than many popular RegEx engines while at the same time being able to return all the outputs: a task that most other engines tend to struggle with.
| UT | WOS:001059181900010 |
|---|---|
| Number of Citations | 5 |
| Type | |
| Pages | 2792-2804 |
| ISSUE | 11 |
| Volume | 16 |
| Month of Publication | JUL |
| Year of Publication | 2023 |
| DOI | https://doi.org/10.14778/3611479.3611488 |
| ISSN | |
| ISBN |