Go bindings to RUst's REgex engine

This package provides cgo bindings to Rust's regex engine using its C API.

Dual-licensed under MIT or the UNLICENSE.

Documentation

The primary documentation for the Rust library, including a definition of the syntax, can be found here: https://docs.rs/regex/#syntax

Install

You'll need to install Rust (you'll need at least Rust 1.9, which is the current beta release) and have a Go compiler handy. To run tests for rure-go, we'll need to compile Rust's regex library and then tell the Go compiler where to find it. These commands should do it:

$ git clone git://github.com/rust-lang/regex
$ cargo build --release --manifest-path ./regex/regex-capi/Cargo.toml
$ export CGO_LDFLAGS="-L$(pwd)/regex/target/release"
$ export LD_LIBRARY_PATH="$(pwd)/regex/target/release"
$ go get -t github.com/BurntSushi/rure-go
$ go test github.com/BurntSushi/rure-go

And to run benchmarks:

$ go test github.com/BurntSushi/rure-go/bench/rust -cpu 1 -run / -bench .

Replace rust with go in the package path to run the same set of benchmarks using Go's regexp package.

Example usage

This shows how to compile a regex, iterate over successive matches and extract a capturing group using its name:

func ExampleRegex() {
	re := MustCompile(`\w*(?P<last>\w)`)
	haystack := "foo bar baz quux"
	it := re.Iter(haystack)
	caps := re.NewCaptures()
	for it.Next(caps) {
		// Print the last letter of each word matched.
		start, end, _ := caps.GroupName("last")
		fmt.Println(haystack[start:end])
	}
	// Output:
	// o
	// r
	// z
	// x
}

Motivation

I wrote these bindings primarily as a test case for the C API of Rust's regex library. In particular, I wanted to be sure that it was feasible to write low overhead bindings to another language. For the most part, I think that was a success.

Secondarily, Rust's regex engine is pretty fast and also provides the same algorithmic guarantees that Go's regex engine provides. Therefore, it may prove useful as an alternative under some limited circumstances. First, you must be willing to tolerate cgo. Second, because of cgo function call overhead, your haystacks probably need to be a bit bigger to realize big performance wins. For example, if your use case is matching a regex a lot on very small strings, then cgo will probably prevent you from doing it faster with rure. (You could, however, write your matching loop as a C helper function, but now we're just digging ourselves in deeper.)

Benchmarks

These are the benchmarks as they are defined in Go's regexp package. Two important comments:

Rust's regex engine hits a performance ceiling because of cgo function call overhead. As a result, Go is faster on some benchmarks.
Several of the benchmarks are dramatically distorted because of optimizations in Rust's regex engine. In fact, none of these benchmarks give an accurate depiction of Rust's regex engine overall speed. For example, the Hard1 benchmark doesn't even use the regex engine at all, because Rust's library will detect it as a pure alternation of literals and fall back to Aho-Corasick. The other {easy,medium,hard} benchmarks run the regex engine in reverse from the end of the string since the regexes are anchored at the end. As a result, the time it takes to execute the search doesn't vary with the length of the haystack.

$ benchcmp ~/clones/go/src/regexp/regex-go ./regex-rust
benchmark                                 old ns/op       new ns/op     delta
BenchmarkLiteral                          140             229           +63.57%
BenchmarkNotLiteral                       2099            328           -84.37%
BenchmarkMatchClass                       3252            295           -90.93%
BenchmarkMatchClass_InRange               2957            243           -91.78%
BenchmarkAnchoredLiteralShortNonMatch     118             222           +88.14%
BenchmarkAnchoredLiteralLongNonMatch      133             223           +67.67%
BenchmarkAnchoredShortMatch               189             226           +19.58%
BenchmarkAnchoredLongMatch                364             226           -37.91%
BenchmarkOnePassShortA                    808             255           -68.44%
BenchmarkNotOnePassShortA                 867             254           -70.70%
BenchmarkOnePassShortB                    571             256           -55.17%
BenchmarkNotOnePassShortB                 621             255           -58.94%
BenchmarkOnePassLongPrefix                131             272           +107.63%
BenchmarkOnePassLongNotPrefix             485             272           -43.92%
BenchmarkMatchParallelShared              332             279           -15.96%
BenchmarkMatchEasy0_32                    104             222           +113.46%
BenchmarkMatchEasy0_1K                    648             222           -65.74%
BenchmarkMatchEasy0_32K                   11524           222           -98.07%
BenchmarkMatchEasy0_1M                    423966          222           -99.95%
BenchmarkMatchEasy0_32M                   14098625        213           -100.00%
BenchmarkMatchEasy0i_32                   2323            228           -90.19%
BenchmarkMatchEasy0i_1K                   68757           228           -99.67%
BenchmarkMatchEasy0i_32K                  2711922         228           -99.99%
BenchmarkMatchEasy0i_1M                   86969502        228           -100.00%
BenchmarkMatchEasy0i_32M                  2775970393      233           -100.00%
BenchmarkMatchEasy1_32                    86.0            229           +166.28%
BenchmarkMatchEasy1_1K                    1455            229           -84.26%
BenchmarkMatchEasy1_32K                   52058           229           -99.56%
BenchmarkMatchEasy1_1M                    1825699         229           -99.99%
BenchmarkMatchEasy1_32M                   58742205        211           -100.00%
BenchmarkMatchMedium_32                   1361            242           -82.22%
BenchmarkMatchMedium_1K                   40203           242           -99.40%
BenchmarkMatchMedium_32K                  1713513         242           -99.99%
BenchmarkMatchMedium_1M                   54703006        242           -100.00%
BenchmarkMatchMedium_32M                  1755415300      214           -100.00%
BenchmarkMatchHard_32                     2183            229           -89.51%
BenchmarkMatchHard_1K                     66067           229           -99.65%
BenchmarkMatchHard_32K                    3113396         229           -99.99%
BenchmarkMatchHard_1M                     99616880        229           -100.00%
BenchmarkMatchHard_32M                    3189585261      214           -100.00%
BenchmarkMatchHard1_32                    12668           272           -97.85%
BenchmarkMatchHard1_1K                    395644          2333          -99.41%
BenchmarkMatchHard1_32K                   16404742        68659         -99.58%
BenchmarkMatchHard1_1M                    525035229       2186555       -99.58%
BenchmarkMatchHard1_32M                   16667981723     70463061      -99.58%

benchmark                    old MB/s     new MB/s         speedup
BenchmarkMatchEasy0_32       306.08       143.69           0.47x
BenchmarkMatchEasy0_1K       1578.82      4605.20          2.92x
BenchmarkMatchEasy0_32K      2843.36      147213.66        51.77x
BenchmarkMatchEasy0_1M       2473.25      4719689.39       1908.29x
BenchmarkMatchEasy0_32M      2379.98      156878555.03     65915.91x
BenchmarkMatchEasy0i_32      13.77        139.75           10.15x
BenchmarkMatchEasy0i_1K      14.89        4475.04          300.54x
BenchmarkMatchEasy0i_32K     12.08        143235.97        11857.28x
BenchmarkMatchEasy0i_1M      12.06        4582492.43       379974.50x
BenchmarkMatchEasy0i_32M     12.09        143626597.90     11879784.77x
BenchmarkMatchEasy1_32       372.19       139.49           0.37x
BenchmarkMatchEasy1_1K       703.69       4467.93          6.35x
BenchmarkMatchEasy1_32K      629.44       143010.92        227.20x
BenchmarkMatchEasy1_1M       574.34       4575198.42       7966.01x
BenchmarkMatchEasy1_32M      571.22       158747658.13     277909.84x
BenchmarkMatchMedium_32      23.50        132.01           5.62x
BenchmarkMatchMedium_1K      25.47        4231.03          166.12x
BenchmarkMatchMedium_32K     19.12        135348.39        7078.89x
BenchmarkMatchMedium_1M      19.17        4331561.73       225955.23x
BenchmarkMatchMedium_32M     19.11        156575833.69     8193397.89x
BenchmarkMatchHard_32        14.66        139.50           9.52x
BenchmarkMatchHard_1K        15.50        4466.01          288.13x
BenchmarkMatchHard_32K       10.52        142758.39        13570.19x
BenchmarkMatchHard_1M        10.53        4574748.55       434449.06x
BenchmarkMatchHard_32M       10.52        156417047.69     14868540.65x
BenchmarkMatchHard1_32       2.53         117.23           46.34x
BenchmarkMatchHard1_1K       2.59         438.78           169.41x
BenchmarkMatchHard1_32K      2.00         477.25           238.62x
BenchmarkMatchHard1_1M       2.00         479.56           239.78x
BenchmarkMatchHard1_32M      2.01         476.20           236.92x

Here are the benchmarks ported from Rust's regex library. These benchmarks do a better job of characterizing the performance difference. In particular, they use FindAll() on a single 0.5MB haystack, which iterates over successive matches in C to avoid cgo function call overhead.

benchmark                               old MB/s     new MB/s     speedup
BenchmarkSherlock                       3805.25      7861.29      2.07x
BenchmarkHolmes                         1339.82      9515.28      7.10x
BenchmarkSherlockHolmes                 2900.11      14011.42     4.83x
BenchmarkSherlockNoCase                 9.76         509.87       52.24x
BenchmarkHolmesNoCase                   10.66        534.47       50.14x
BenchmarkSherlockHolmesNoCase           9.84         516.71       52.51x
BenchmarkNameWhitespace                 2591.41      6870.12      2.65x
BenchmarkNameAlt1                       1556.22      12787.94     8.22x
BenchmarkNameAlt2                       10.20        2789.56      273.49x
BenchmarkNameAlt3                       2.93         485.43       165.68x
BenchmarkNameAlt3NoCase                 1.56         429.32       275.21x
BenchmarkNameAlt4                       10.17        2311.07      227.24x
BenchmarkNameAlt4NoCase                 5.31         434.40       81.81x
BenchmarkNameAlt5                       7.08         1735.48      245.12x
BenchmarkNameAlt5NoCase                 3.55         443.53       124.94x
BenchmarkNoMatchUncommon                15335.81     21447.14     1.40x
BenchmarkNoMatchCommon                  487.44       21294.92     43.69x
BenchmarkNoMatchReallyCommon            468.76       1622.89      3.46x
BenchmarkTheLower                       100.82       755.81       7.50x
BenchmarkTheUpper                       1219.33      8116.70      6.66x
BenchmarkTheNoCase                      9.68         324.15       33.49x
BenchmarkTheWhitespace                  59.02        442.80       7.50x
BenchmarkEverythingGreedy               8.86         179.60       20.27x
BenchmarkEverythingGreedyNL             10.75        472.34       43.94x
BenchmarkLetters                        1.72         15.70        9.13x
BenchmarkLettersUpper                   13.99        247.74       17.71x
BenchmarkLettersLower                   1.64         16.43        10.02x
BenchmarkWords                          4.19         48.45        11.56x
BenchmarkBeforeHolmes                   8.58         517.72       60.34x
BenchmarkHolmesCocharWatson             10.73        2634.69      245.54x
BenchmarkHolmesCowordWatson             2.15         883.32       410.85x
BenchmarkQuotes                         12.49        1002.75      80.28x
BenchmarkLineBoundarySherlockHolmes     12.98        534.98       41.22x
BenchmarkWordEndingN                    8.82         288.50       32.71x
BenchmarkRepeatedClassNegation          4.68         1.09         0.23x
BenchmarkIngSuffix                      10.77        413.77       38.42x
BenchmarkIngSuffixLimitedSpace          6.96         437.96       62.93x

The principal reason for the performance difference is that Rust's regex engine (like RE2) has a lazy DFA. There is ongoing work to add a lazy DFA to Go's regexp package.

For fun, and since Rust's regex engine, RE2 (in C++) and Go's regexp library are very much related and share many of the same implementation details, here's a comparison between Rust's regex engine and RE2 using Rust's benchmark harness:

name                                     re2 ns/iter           rust ns/iter            diff ns/iter   diff %
sherlock::before_holmes                  1,443,662 (412 MB/s)  1,142,847 (520 MB/s)        -300,815  -20.84%
sherlock::everything_greedy              9,310,589 (63 MB/s)   2,516,121 (236 MB/s)      -6,794,468  -72.98%
sherlock::everything_greedy_nl           2,491,676 (238 MB/s)  1,204,517 (493 MB/s)      -1,287,159  -51.66%
sherlock::holmes_cochar_watson           1,175,610 (506 MB/s)  222,959 (2,668 MB/s)        -952,651  -81.03%
sherlock::holmes_coword_watson           1,347,865 (441 MB/s)  638,450 (931 MB/s)          -709,415  -52.63%
sherlock::ing_suffix                     3,028,970 (196 MB/s)  1,361,756 (436 MB/s)      -1,667,214  -55.04%
sherlock::ing_suffix_limited_space       1,957,296 (303 MB/s)  1,302,217 (456 MB/s)        -655,079  -33.47%
sherlock::letters                        111,006,069 (5 MB/s)  25,340,930 (23 MB/s)     -85,665,139  -77.17%
sherlock::letters_lower                  107,183,616 (5 MB/s)  24,525,566 (24 MB/s)     -82,658,050  -77.12%
sherlock::letters_upper                  4,709,166 (126 MB/s)  2,082,277 (285 MB/s)      -2,626,889  -55.78%
sherlock::line_boundary_sherlock_holmes  2,552,194 (233 MB/s)  1,114,865 (533 MB/s)      -1,437,329  -56.32%
sherlock::name_alt1                      77,602 (7,666 MB/s)   37,361 (15,923 MB/s)         -40,241  -51.86%
sherlock::name_alt2                      1,320,543 (450 MB/s)  191,247 (3,110 MB/s)      -1,129,296  -85.52%
sherlock::name_alt3                      1,439,985 (413 MB/s)  1,280,912 (464 MB/s)        -159,073  -11.05%
sherlock::name_alt3_nocase               2,756,402 (215 MB/s)  1,339,666 (444 MB/s)      -1,416,736  -51.40%
sherlock::name_alt4                      1,362,748 (436 MB/s)  231,035 (2,575 MB/s)      -1,131,713  -83.05%
sherlock::name_alt4_nocase               2,025,273 (293 MB/s)  1,313,315 (453 MB/s)        -711,958  -35.15%
sherlock::name_alt5                      1,347,991 (441 MB/s)  322,420 (1,845 MB/s)      -1,025,571  -76.08%
sherlock::name_alt5_nocase               2,115,018 (281 MB/s)  1,314,237 (452 MB/s)        -800,781  -37.86%
sherlock::name_holmes                    166,170 (3,580 MB/s)  47,617 (12,494 MB/s)        -118,553  -71.34%
sherlock::name_holmes_nocase             1,647,206 (361 MB/s)  1,114,830 (533 MB/s)        -532,376  -32.32%
sherlock::name_sherlock                  58,758 (10,125 MB/s)  69,917 (8,509 MB/s)           11,159   18.99%
sherlock::name_sherlock_holmes           60,607 (9,816 MB/s)   36,615 (16,248 MB/s)         -23,992  -39.59%
sherlock::name_sherlock_holmes_nocase    1,540,731 (386 MB/s)  1,160,475 (512 MB/s)        -380,256  -24.68%
sherlock::name_sherlock_nocase           1,536,512 (387 MB/s)  1,161,208 (512 MB/s)        -375,304  -24.43%
sherlock::name_whitespace                62,172 (9,569 MB/s)   78,795 (7,550 MB/s)           16,623   26.74%
sherlock::no_match_common                440,514 (1,350 MB/s)  25,933 (22,941 MB/s)        -414,581  -94.11%
sherlock::no_match_really_common         440,342 (1,351 MB/s)  363,675 (1,635 MB/s)         -76,667  -17.41%
sherlock::no_match_uncommon              23,726 (25,075 MB/s)  25,809 (23,051 MB/s)           2,083    8.78%
sherlock::quotes                         1,393,911 (426 MB/s)  557,791 (1,066 MB/s)        -836,120  -59.98%
sherlock::the_lower                      2,504,957 (237 MB/s)  620,410 (958 MB/s)        -1,884,547  -75.23%
sherlock::the_nocase                     3,695,971 (160 MB/s)  1,681,153 (353 MB/s)      -2,014,818  -54.51%
sherlock::the_upper                      232,581 (2,557 MB/s)  50,412 (11,801 MB/s)        -182,169  -78.32%
sherlock::the_whitespace                 2,279,605 (260 MB/s)  1,121,804 (530 MB/s)      -1,157,801  -50.79%
sherlock::word_ending_n                  3,330,263 (178 MB/s)  2,015,397 (295 MB/s)      -1,314,866  -39.48%
sherlock::words                          30,546,351 (19 MB/s)  9,588,968 (62 MB/s)      -20,957,383  -68.61%

Some possible explanations for the performance difference:

Rust's regex engine does quite a bit better with prefix literal detection. Instead of limiting itself to just a single byte (like RE2), it will expand alternates and character classes (within some limit) to come up with literals. It will then use memchr, Boyer-Moore or Aho-Corasick before diving into the regex engine.
Rust's inner DFA loop is explicitly unrolled a few times, which decreases the number of instructions it has to execute. (RE2's DFA loop is not unrolled.)

There's probably more, but my experience suggests the above are the most significant.

Bugs

Rust's regex engine uses the equivalent of C's size_t type to represent match offsets into a haystack. This FFI wrapper converts such offsets to Go's int type, which is not guaranteed to have the same size as a size_t, and of course, int is signed while size_t is not.

It's not clear what the right answer is here. Today, the conversions are unchecked, which means that callers will get wrong answers if the size of the haystack exceeds the size of Go's int type.

BurntSushi/rure-go

BurntSushi

Reviews

Repository Details