-
Notifications
You must be signed in to change notification settings - Fork 186
Indentation linters #497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indentation linters #497
Conversation
Either the default for this will have to change, or this new linter will not be able to be in the set of default linters. The default linters adhere to the tidyverse style guide which indents the function arguments in this way. |
A few questions after skimming the style guide. Sorry for the litany of stylistic options, but there are a lot of decisions wrapped into this linter and I want to make sure I'm capturing the right things.
Just to double check before I spend too much time on implementation, in the tidyverse style guide this appears to only apply to the function header, but not to function calls. Is that correct?
I also dug around a bit in the style guide but didn't see how indentation should be handled for indexing operators whose arguments span to new lines. Following the function call style, I believe it should be indented by one indentation level. # Good?
mtcars[
mtcars$carb > 2,
c("carb", "cyl")
]
# Bad?
mtcars[mtcars$carb > 2,
c("carb", "cyl")]
In a few of the examples, functions are written so that the closing parenthesis of a function call is on a new line, returning to the parent indentation. from 2.5 Long Lines do_something_very_complicated(
something = "that",
requires = many,
arguments = "some of which may be long"
) However, as far as I can tell this notation isn't described anywhere as a rule. Right now closing parentheses aren't checked for indentation, but I could add it in and it would follow the same rule as closing curly-braces for parentheses whose parent expressions span multiple lines.
A style that I see every now and then puts the comma between parameters at the start of a new line. do_something_very_complicated(
something = "that"
, requires = many
, arguments = "some of which may be long"
) I don't see this anywhere in the style guide examples, but it doesn't appear to be explicitly disallowed. Currently my indentation linter ignores the commas and will flag the remaining expression as improperly indented even if the leading comma is properly indented. Should this style be accommodated? |
|
Just committing to bring the WIP up to speed. I still have some work to do to get some of the parameters working as intended, but it's coming along. Now implemented
Not yet implemented
|
…simplifying logic
…reallocating new cols
@jimhester This PR is reaching a state that I'm pretty happy with. Overview of available linting behaviorsJust to recap, the linter handles:
Unaddressed edge casesI couldn't find any info in the tidyverse style guide for multiline # good/bad?
if (1 < 2 &
3 < 4) { # indented to opening parenthesis
TRUE
}
# good/bad?
if (1 < 2 &
3 < 4) { # indented by <indent = 2L> characters
TRUE
} Test casesI think I covered most situations with minimal examples and have tested it against larger files producing expected results. Please let me know if you see gaps in the testing - happy to add more as needed, especially if we can nail down an expected behavior for the IF/FOR/WHILE keyword syntax. |
… into dgkf/indentation_linter
Most recent additions address the keyword syntax for |
I wouldn't recommend the second style - this puts the trailing conditions on the same indentation level as the logic which looks very confusing to me. |
Thanks @AshesITR - This behavior is currently just toggled by a parameter to the linter. I can certainly swap the default over the parenthesis indentation. To that point, I've been trying to think of ways to make this more flexible. I think that splitting up the different indentation of different syntax elements into different linters so that they can be more fully parameterized and customizable. My only concern with this is that I derive quite a few new columns for the parsed content data.frame, and I would want to avoid recalculating those new columns for a handful of nearly identical linters. If there's a more appropriate place for these to all be calculated, I'd be happy to lift it out of the linter. I'd hope to make the linters look more like this: keyword_paren_expr_linter(anchor = "parenthesis", indent = 0L)
# # good
# while (i <
# 3L) { ... # indented 0 characters relative to opening parenthesis
keyword_paren_expr_linter(anchor = "parent", indent = 2L)
# # good
# while (i <
# 3L) { ... # indented 2 characters relative to parent block |
How about an interface like indentation_linter(
indent = 2L,
control_flow = c("hanging", "indent"),
function_call = c("hanging", "indent"),
index = c("hanging", "indent")
) Which would each allow only the "hanging" style (anchor = "parenthesis") or the "indented" style, or both for one of the three types of paren expressions. I don't think |
The reason I suggested having the indentation as a parameter to each syntax-specific indentation linter was because someone may want to do a python PEP8-style double-indent to break up indentation blocks: if (x > 1L) {
# typical control-flow indentation is 2 characters
var_one <- x
foo <- long_function_name(
# within function calls, an extra indentation level is used
var_one, var_two, var_three,
var_four)
} If starting down the path of making this more flexible, it might be worth considering how it could support multiple valid indentation styles such that # valid (hanging indent)
foo <- long_function_name(var_one
var_two, var_three, var_four)
# valid (aligned to paren)
foo <- long_function_name(var_one, var_two,
var_three, var_four)
# invalid (only indented 1 level)
foo <- long_function_name(var_one
var_two, var_two, var_four) To accommodate this level of customization, I started playing around with functions as inputs to these arguments, but then felt like I was just writing another interface to add linters. At the time I hadn't considered possibly allowing multiple valid syntax. With that in mind, it makes more sense to handle that within the It might look something like this: indent_scheme <- function(anchor = c("parent", "parenthesis"), indent_levels = 1L) {
function(anchors, n_indent_chars)
anchors[[anchor]] + indent_levels * n_indent_chars
}
indent_double <- indent_scheme(indent_levels = 2L)
indent_paren <- indent_scheme("parenthesis", indent_levels = 0L)
indentation_linter(
indent = 2L,
# single valid indent scheme provided as function
control_flow = indent_paren,
# list of valid indentation schemes provided as list of functions
function_call = list(indent_double, indent_paren)
) |
Hmm I understand that you wish to provide high flexibility, but considering that all other linters have fairly simple (read atomic vector) configuration arguments, I'd favor a customisation interface more in line with the rest. If I understand correctly, the linter needs the following configuration parameters:
With atomic vectors, we could (ab)use their names for the anchor info, maybe defaulting to "parent". tidyverse-style = default args: your example style: This still looks a bit overwhelming to me... @jimhester What's your opinion on a possible configuration interface? |
Resurrecting this PR since I haven't heard back in a while. Were there any thoughts on a interface direction here? I'm happy to continue working on this, but I'd prefer to get some feedback first. |
I generally think less is more in terms of configuration options, at least on the public facing interfaces. Maybe just having two wrapper functions with the most likely defaults would be best? |
Are the wrappers everything that is missing from this PR to be considered for merging or is there more to do here? @jimhester @dgkf |
@kpagacz I'm still chipping away at this. In the process of polishing it up I realized that header indentation was only being handled at the top level of header expressions. I've made some progress toward making it more generalized, but it was a pretty big shift from the current implementation. I got a bit derailed through the holiday season, but plan to get this into shape when I have some extra time to revisit it again. |
Related issue is #63 |
Thanks again for your work here @dgkf !! Please don't hesitate to provide review comments / notes in the successor PR. Closing this now to tidy up the PR queue. |
* implement indentation_linter supersedes #497 * xml2::* * fix tests * fix range start (off by one) * ignore indentation level within string constants * fix lints and false positives, handle nested indentation changes * de-lint package, explicit branch for no lints * adapt indent suppression condition * change indentation end path for $function calls * return non-empty file_lines even if no R source is present * ensure Rmd is detected * .Rmd * add use_hybrid_indent = TRUE * except hanging indent if following expr is a code block * smarter hanging indent detection for use_hybrid_indent * suppress redundant lints * update tests * de-lint * move around NEWS * if condition styling * use TODO for consistency/searchability * address review comments * NEWS wording * add more tests, refactor to file-level because of a systematic problem with expression level * refac use_hybrid_indent -> hanging_indent_style * allow compare_branches.R to use custom parameters if desired * improve docs * de-lint Co-authored-by: Michael Chirico <chiricom@google.com>
Adding a new linter function which can spawn two new
Lint
sLint
, which will get thrown if child expressions beginning on a new line aren't indented byindent
(a parameter to the linter).Lint
specific to closing curly braces, thrown when closing curly braces aren't closed by returning to the parent indentation level.It includes two parameters to slightly tweak the behavior
indent
to specify the number of characters to use for indentationoutermost_only
to only report the outermost parent expression that is improperly indented, defaulting toTRUE
. When this isFALSE
, nested indentations, even if they're correctly indented relative to the document root, will throwLint
s as they are not indented properly relative to the parent expression.Possibly contentiously, this linter will spawn
Lint
s when function calls with many arguments do not wrap to one relative indentation level. Although this is my preferred style, I've seen a large community preference for indenting new lines to the same indentation as the function call's opening parenthesis. e.g. the following snippet will throw aLint
, despite it being a popular style.My personal opinion is that an indentation rule shouldn't be impacted by something as arbitrary as the names we give to variables or functions, but a parameter might be needed to accommodate this popular style.