-
Notifications
You must be signed in to change notification settings - Fork 186
implement fixed_regex_linter as plain R + regex #1032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Changed target to master temporarily to get GHA goodness. |
@MichaelChirico LMK if you would like me to adapt |
yes please do. agree about scope. |
Some patience here... my mirror has gathered some dust that's slow to remove :) |
No worries, thanks for the update 😊 |
On a sample of 2,000 packages, there are 8 false positives where the R version says They are all the same expression:
(also FWIW the two branches ran in almost indistinguishable time -- 13.18 (C) vs 13.49 minutes) |
Ah I see that must be because the special regex skips if a Great news on the performance side! |
Assuming we can fix the issue, let's (1) merge #1021 then quickly (2) merge this to |
Okay, we can do that. Or merge once and not squash the changes during merge? |
Found a fix and added it. Also added a test case for that to make sure it works. |
first I'll run again to make sure we didn't whack-a-mole any new issues. I think we do want to squash some commits but not others which will be a pain... easier to merge the two PRs with squash |
Allright. I pre-approved the C implementation. Merge at your discretion; LMK if I need to take another look. |
During some manual testing, I noticed that at least the most recent R versions don't have all of the features mentioned at the link I provided. |
# Conflicts: # man/linters.Rd
@MichaelChirico All tests succeed on all platforms now 🥳 |
awesome!! thanks again for your patience/diligence here! I'll start another run tomorrow evening and hopefully we can (finally) merge |
Sounds good, thanks a lot for the extensive testing and feedback! |
Some issues... getting warnings running the linter on some packages, e.g.
(IINM that'll be an error in r-devel) Packages that don't seem to want to lint on this branch:
|
I think that may be throwing things off for the rest of the lints. The current results have a ton of false positives that I don't reproduce if I try and run the expressions individually. |
simple enough fix 😅 |
I was somehow convinced that str was a single string and not a vector 😅 |
OK another report. Ran on 1300 unique packages. Still a few cases where C & R disagree.
This looks like the C side is recognizing the
The ones that hinge on the value of So I believe we should fix the |
That's not a FP related to pipes, no?
So I'd suggest we assume The necessary fix would be to disallow some characters after |
ok that works for me, esp. since perl=TRUE is the engine for stringr functions too. this PR is teaching me way more about regex than I ever cared to know 🥲 |
thanks!! starting the merge 🚀🚀🚀🚀 |
# Conflicts: # R/fixed_regex_linter.R # tests/testthat/test-fixed_regex_linter.R
thanks again!! |
Thank you too! |
I think converting the regex to a whitelist of to be linted regexes should eliminate the false positives.
… Am 10.04.2022 um 06:26 schrieb Michael Chirico ***@***.***>:
Mixed results -- some things caught by R only, some things caught by C only
Mainly it looks like on the C branch, I am too strict for the [$CHAR] case (e.g. $CHAR can be an escaped character, or a \u-escaped string), while the R branch is too loose (namely, the default regex allows []...] to be a single character class including ] and ...).
False positives in the R-only branch:
strsplit(rangeStr, "[][ ]")
gsub("[][]", "", line)
strsplit(colnames(obj)[-1], "[],[]")
gsub("[]}]", ")", transformed)
gsub("[]:]","",betainfo)
False negatives in the C-only branch:
stringr::str_replace_all(lines, "[\u0451]", "\u0435")
gsub("\u{A0}", " ", out, useBytes = TRUE) (one other identical hit in roxygen2)
grep("[]]", x) (23 total hits like this or []])
gsub( '[\\]', '/', dirname( chname)) (10 total hits like this)
gsub("[\r]", "", config_char) (5 total hits like this or [\n])
strsplit(ls[1],"[\"]") (2 total hits like this)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.
|
NB merge target is the original fixed_regex PR #1021 so we can separately review the implementation of
is_not_regex
and the PR in full.