-
Notifications
You must be signed in to change notification settings - Fork 462
Poor performance in some cases compared to oniguruma #604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Compile times are also significantly slower but that seems to be a known property: #452 |
Hiya! Thanks for doing an analysis and reporting this issue. I appreciate it. I have two high level thoughts off the bat:
With all that said, I can likely explain why the So, if you actually need Unicode support here, then you ought to make sure your benchmarking with equivalent options. If the
It is known, yes, but #452 is the wrong thing to look at. The PR talks about parse time, which is typically an overall very small portion of total compilation time of a regex. Your regexes aren't exactly small, and you're using some fairly large Unicode classes in your |
@BurntSushi thank you for the detailed response. I have an easy way of answering how MRI:
Artichoke with onig:
I need to support non-ASCII word boundaries. Thinking about how I can work around this pathological case:
Does |
No, there's no way to tell whether the regex contains a Unicode It sounds like there isn't anything to be done here on my end. Some day, I'd like to try and fix the Unicode |
Thank you @BurntSushi for helping me to find a path forward. |
I'm looking at replacing oniguruma with regex in some situations for the Ruby that I'm building.
I am benchmarking the following three
Regexp
s over this several megabyte text corpus:For
Email
, regex is 10x faster than oniguruma. ForURI
, regex is 2x slower than oniguruma. ForIP
, regex is 20x slower than oniguruma.regex performance
oniguruma performance (via rust-onig)
If you're interested in doing so, you can invoke this benchmark in Artichoke with:
The benchmark on master (with oniguruma) is different than the benchmark on this branch because I've tweaked the
Regexp
s to remove lookahead patterns.The text was updated successfully, but these errors were encountered: