Fix #441 : paren-brace, equals-NA and no-tab lints in strings or comments #467

russHyde · 2020-03-16T11:28:56Z

Some regex-based linters were throwing false-positives in irrelevant bits of source code (in strings, or in comments; see #441 ).
For example,

the paren-brace linter was flagging the regex in grepl("(iss){2}", "Mississippi"),
the equals-na linter was flagging is.na(x) # test whether x == NA"
the no-tabs linter was flagging multi_line_tabbed_string <- "lorem ipsum\n\tblah de blah"

Tests based on these false-positives were added.

This PR abstracts some of the code that these regex-based linters use into a function "make_linter_from_regex"; this function can be tuned to ignore regex matches that occur in comments and/or string definitions. Then redefines paren_brace_linter, equals_na_linter and no_tabs_linter to use make_linter_from_regex.

I have had some trouble understanding comment-parsing and would like to continue working on this PR for a bit (I'm shifting it off my work computer since I might not be able to access my office for a while)

jimhester · 2020-03-16T14:35:32Z

The main question I guess is if these linters should remain using regexes. The no tabs linter I guess has to, as you don't have access to whitespace in the AST. But definitely the equals_na linter could be implemented by looking at the AST and probably the paren brace linter as well, which would likely be more robust than relying on regexes for their behavior.

gaborcsardi · 2020-03-16T14:38:51Z

I think even for the whitespace linters, it is a better solution to create the AST, and then get all the positions that are not accounted for by terminal tokens, plus the trailing whitespace.

Otherwise it is pretty difficult to handle this with regular expressions, especially with the new raw character constants.

russHyde · 2020-03-16T14:49:18Z

Totally agree. I originally tried (and stuggled) to implement the paren-brace linter using xpath; I'll have a look at it again

gaborcsardi · 2020-03-16T14:54:38Z

Totally agree. I originally tried (and stuggled) to implement the paren-brace linter using xpath; I'll have a look at it again

Let me know if you need help with the xpath way, it is really difficult to use it AFAIR. Unfortunately, apart from xpath, there isn't much software out there for attributed sub-graph isomorphism.

russHyde · 2020-03-16T14:56:32Z

Thanks

russHyde · 2020-03-31T09:22:32Z

The the equals-NA linter has been reimplemented using xpath. I will try to do the same for paren-braces linter (but will leave the tab-linter as is).

xpath seems like a multi-dimensional regex.

Should we be concerned that making xpath-searches the norm for linter implementations may raise the bar for new contributors? As such, I'd still like to sure up the code for building linters based on regexes in a way that can optionally exclude comments, string literals etc. Will continue to work on this.

I'm not sure why the Travis run isn't being reported in the PR checks (that is, whether it's a github or a travis issue): it can be found here https://travis-ci.org/github/jimhester/lintr/builds/668856106

Observed failures in Travis on R3.3 and R3.2 because Rcpp-compilation fails (something I've recently seen for my own (lintr-dependent) package; I just dropped the tests/support for R < 3.4 in dupree when that happened).

jimhester · 2020-03-31T13:12:54Z

I agree with you about using XPath making it more difficult for contributors. That is also the reason I didn't use C or C++ in lintr, I wanted to lower the bar to contributors. Unfortunately R is not terribly well suited to this particular task and the performance is not great, which drives us towards more esoteric solutions like XPath...

Unfortunately I don't have any great solutions :/

gaborcsardi · 2020-03-31T14:14:57Z

Well, I would say that, yes, xpath is more difficult to write. But imo it is still easier to write a correct linter with xpath than with regexes. Especially with the new raw string that is coming in R 4.0.0.

An alternative to the xpath search would be to walk the AST manually, but that is still quite error prone imo, except for maybe the simplest patterns.

(Being the author of the xpath based search I am probably a bit (?) biased. :)

jimhester · 2020-03-31T14:20:32Z

I agree that xpath is definitely better than regexes

Tests added to ensure paren-brace linter, equals-NA linter and no-tab linter do not flag lints in irrelevant sections of code (eg, strings / comments).

This can check if a regex-match is covered by an expression in a source_file and restrict analysis to tokens of a particular type, if required.

Also, rename file containing equals_na_linter to match format of other linter-files (`*_linter.R` not `*_lintr.R`)

russHyde · 2020-04-06T09:45:06Z

@jimhester
I've updated the code.

No-Tab linter is still defined using a regex (and tabs within strings are ignored by this regex)
Both paren-brace and equals-NA linter are defined using xpath searches

I have added tests that should capture all the original bugs (paren-braces in comments or strings, equals-NA in comments or strings, tab-prefixed lines in a multi-line string) and a couple of additional tests.

The travis run is here: https://travis-ci.org/github/jimhester/lintr/builds/671546861
It still fails on R3.2 and 3.3; due to an Rcpp installation issue

MichaelChirico · 2020-11-30T04:12:58Z

superseded by #620 (starts from the same history, just from a branch originating on the main repo)

russHyde requested a review from jimhester as a code owner March 16, 2020 11:28

russHyde added 8 commits April 6, 2020 10:11

add tests: code-bound linters should ignore strings / comments

b04bce6

Tests added to ensure paren-brace linter, equals-NA linter and no-tab linter do not flag lints in irrelevant sections of code (eg, strings / comments).

add function: convert a regex into a linter

559e425

This can check if a regex-match is covered by an expression in a source_file and restrict analysis to tokens of a particular type, if required.

regex-based linters can filter out strings but not comments

ed397a4

regex-based no_tab_linter

e57aa69

regex-based equals_na_linter

0439725

xpath-based equals_na_linter

dce0acb

Also, rename file containing equals_na_linter to match format of other linter-files (`*_linter.R` not `*_lintr.R`)

regex-based paren_brace_linter

bad7edc

xpath-based paren_brace_linter

998de4a

russHyde force-pushed the regex-based-linter-issue-441 branch from d42bf64 to 998de4a Compare April 6, 2020 09:32

russHyde changed the title ~~[WIP] Regex-based linter issue 441~~ Fix #441 : paren-brace, equals-NA and no-tab lints in strings or comments Apr 6, 2020

This was referenced Nov 29, 2020

Make R/ conform to paren_brace_linter #603

Closed

Fix some regex-based linters getting spurious matches #620

Merged

MichaelChirico closed this Nov 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #441 : paren-brace, equals-NA and no-tab lints in strings or comments #467

Fix #441 : paren-brace, equals-NA and no-tab lints in strings or comments #467

russHyde commented Mar 16, 2020 •

edited

Loading

jimhester commented Mar 16, 2020

gaborcsardi commented Mar 16, 2020 •

edited

Loading

russHyde commented Mar 16, 2020

gaborcsardi commented Mar 16, 2020

russHyde commented Mar 16, 2020

russHyde commented Mar 31, 2020

jimhester commented Mar 31, 2020

gaborcsardi commented Mar 31, 2020 •

edited

Loading

jimhester commented Mar 31, 2020

russHyde commented Apr 6, 2020

MichaelChirico commented Nov 30, 2020

Fix #441 : paren-brace, equals-NA and no-tab lints in strings or comments #467

Fix #441 : paren-brace, equals-NA and no-tab lints in strings or comments #467

Conversation

russHyde commented Mar 16, 2020 • edited Loading

jimhester commented Mar 16, 2020

gaborcsardi commented Mar 16, 2020 • edited Loading

russHyde commented Mar 16, 2020

gaborcsardi commented Mar 16, 2020

russHyde commented Mar 16, 2020

russHyde commented Mar 31, 2020

jimhester commented Mar 31, 2020

gaborcsardi commented Mar 31, 2020 • edited Loading

jimhester commented Mar 31, 2020

russHyde commented Apr 6, 2020

MichaelChirico commented Nov 30, 2020

russHyde commented Mar 16, 2020 •

edited

Loading

gaborcsardi commented Mar 16, 2020 •

edited

Loading

gaborcsardi commented Mar 31, 2020 •

edited

Loading