Skip to content

fixed_regex_linter incorrectly suggest that my regex with "\>" is static #1478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dschlaep opened this issue Jul 27, 2022 · 3 comments
Open
Labels
false-positive code that shouldn't lint, but does

Comments

@dschlaep
Copy link

My code removes some text from the end of some other text using sub().

fixed_regex_linter() (incorrectly) suggests that my regular expression is static (example 1 below) if I use the symbol "\>" to match the empty string at end of a word. The suggestion with fixed = TRUE does not reproduce my desired outcome. However, fixed_regex_linter() correctly recognizes that regular expressions with "$" and "\b" are not static (examples 2 and 3) - obviously, I could use one of those instead.

I am not sure whether this has anything to do with the fact that regular expressions with "$" and "\b" work for both perl = FALSE and perl = TRUE while those with "\>" work only for perl = FALSE (?regex).

Example: trim "_stuff" only from end of "name_stuff_more_stuff" with desired result "name_stuff_more"

# 1) Approach "\>"
print(sub("_stuff\\>", "", "name_stuff_more_stuff"))
#> [1] "name_stuff_more"

# Lint
lintr::lint(
  text = 'sub("_stuff\\\\>", "", "name_stuff_more_stuff")',
  linters = lintr::fixed_regex_linter()
)
#> <text>:1:5: warning: [fixed_regex_linter] This regular expression is static, i.e., its matches can be expressed as a fixed substring expression, which is faster to compute. Here, you can use "_stuff>" with fixed = TRUE.
#> sub("_stuff\\>", "", "name_stuff_more_stuff")
#>     ^~~~~~~~~~~

# Suggestion by `fixed_regex_linter()` is incorrect
print(sub("_stuff>", "", "name_stuff_more_stuff", fixed = TRUE))
#> [1] "name_stuff_more_stuff"

Created on 2022-07-27 by the reprex package (v2.0.1)

# 2) Approach "$"
print(sub("_stuff$", "", "name_stuff_more_stuff"))
#> [1] "name_stuff_more"

# No lint
lintr::lint(
  text = 'sub("_stuff$", "", "name_stuff_more_stuff")',
  linters = lintr::fixed_regex_linter()
)

Created on 2022-07-27 by the reprex package (v2.0.1)

# 3) Approach "\b"
print(sub("_stuff\\b", "", "name_stuff_more_stuff"))
#> [1] "name_stuff_more"

# No lint
lintr::lint(
  text = 'sub("_stuff\\\\b", "", "name_stuff_more_stuff")',
  linters = lintr::fixed_regex_linter()
)

Created on 2022-07-27 by the reprex package (v2.0.1)

print(packageVersion("lintr"))
#> [1] '3.0.0.9000'
@MichaelChirico MichaelChirico added the false-positive code that shouldn't lint, but does label Oct 3, 2022
@MichaelChirico
Copy link
Collaborator

Note some discussion here BTW:

#1032 (comment)

I think at root we want to detect the value of perl= and tweak the logic for what's a fixed regex or not accordingly. There are very subtle (and rare! \< is almost unheard of 😸) differences.

@janlimbeck
Copy link

Any updates on when this issue might be resolved?

@MichaelChirico
Copy link
Collaborator

Hi @janlimbeck, we're all volunteers to if this affects you a lot, feel free to take it up & go for a fix! PRs welcome; happy to help you along if you're stuck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
false-positive code that shouldn't lint, but does
Projects
None yet
Development

No branches or pull requests

3 participants