-
Notifications
You must be signed in to change notification settings - Fork 186
Fix #441 : paren-brace, equals-NA and no-tab lints in strings or comments #467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The main question I guess is if these linters should remain using regexes. The no tabs linter I guess has to, as you don't have access to whitespace in the AST. But definitely the |
I think even for the whitespace linters, it is a better solution to create the AST, and then get all the positions that are not accounted for by terminal tokens, plus the trailing whitespace. Otherwise it is pretty difficult to handle this with regular expressions, especially with the new raw character constants. |
Totally agree. I originally tried (and stuggled) to implement the paren-brace linter using xpath; I'll have a look at it again |
Let me know if you need help with the xpath way, it is really difficult to use it AFAIR. Unfortunately, apart from xpath, there isn't much software out there for attributed sub-graph isomorphism. |
Thanks |
The the equals-NA linter has been reimplemented using xpath. I will try to do the same for paren-braces linter (but will leave the tab-linter as is). xpath seems like a multi-dimensional regex. Should we be concerned that making xpath-searches the norm for linter implementations may raise the bar for new contributors? As such, I'd still like to sure up the code for building linters based on regexes in a way that can optionally exclude comments, string literals etc. Will continue to work on this. I'm not sure why the Travis run isn't being reported in the PR checks (that is, whether it's a github or a travis issue): it can be found here https://travis-ci.org/github/jimhester/lintr/builds/668856106 Observed failures in Travis on R3.3 and R3.2 because Rcpp-compilation fails (something I've recently seen for my own (lintr-dependent) package; I just dropped the tests/support for R < 3.4 in dupree when that happened). |
I agree with you about using XPath making it more difficult for contributors. That is also the reason I didn't use C or C++ in lintr, I wanted to lower the bar to contributors. Unfortunately R is not terribly well suited to this particular task and the performance is not great, which drives us towards more esoteric solutions like XPath... Unfortunately I don't have any great solutions :/ |
Well, I would say that, yes, xpath is more difficult to write. But imo it is still easier to write a correct linter with xpath than with regexes. Especially with the new raw string that is coming in R 4.0.0. An alternative to the xpath search would be to walk the AST manually, but that is still quite error prone imo, except for maybe the simplest patterns. (Being the author of the xpath based search I am probably a bit (?) biased. :) |
I agree that xpath is definitely better than regexes |
Tests added to ensure paren-brace linter, equals-NA linter and no-tab linter do not flag lints in irrelevant sections of code (eg, strings / comments).
This can check if a regex-match is covered by an expression in a source_file and restrict analysis to tokens of a particular type, if required.
Also, rename file containing equals_na_linter to match format of other linter-files (`*_linter.R` not `*_lintr.R`)
d42bf64
to
998de4a
Compare
@jimhester
I have added tests that should capture all the original bugs (paren-braces in comments or strings, equals-NA in comments or strings, tab-prefixed lines in a multi-line string) and a couple of additional tests. The travis run is here: https://travis-ci.org/github/jimhester/lintr/builds/671546861 |
superseded by #620 (starts from the same history, just from a branch originating on the main repo) |
Some regex-based linters were throwing false-positives in irrelevant bits of source code (in strings, or in comments; see #441 ).
For example,
grepl("(iss){2}", "Mississippi")
,is.na(x) # test whether x == NA"
multi_line_tabbed_string <- "lorem ipsum\n\tblah de blah"
Tests based on these false-positives were added.
This PR abstracts some of the code that these regex-based linters use into a function "make_linter_from_regex"; this function can be tuned to ignore regex matches that occur in comments and/or string definitions. Then redefines
paren_brace_linter
,equals_na_linter
andno_tabs_linter
to usemake_linter_from_regex
.I have had some trouble understanding comment-parsing and would like to continue working on this PR for a bit (I'm shifting it off my work computer since I might not be able to access my office for a while)