Skip to content

Indentation linters #497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 19 commits into from
Closed

Indentation linters #497

wants to merge 19 commits into from

Conversation

dgkf
Copy link

@dgkf dgkf commented May 25, 2020

Adding a new linter function which can spawn two new Lints

  1. A general indentation Lint, which will get thrown if child expressions beginning on a new line aren't indented by indent (a parameter to the linter).
  2. A Lint specific to closing curly braces, thrown when closing curly braces aren't closed by returning to the parent indentation level.

It includes two parameters to slightly tweak the behavior

  1. indent to specify the number of characters to use for indentation
  2. outermost_only to only report the outermost parent expression that is improperly indented, defaulting to TRUE. When this is FALSE, nested indentations, even if they're correctly indented relative to the document root, will throw Lints as they are not indented properly relative to the parent expression.

Possibly contentiously, this linter will spawn Lints when function calls with many arguments do not wrap to one relative indentation level. Although this is my preferred style, I've seen a large community preference for indenting new lines to the same indentation as the function call's opening parenthesis. e.g. the following snippet will throw a Lint, despite it being a popular style.

long_function_name(arg1 = "a", arg2 = "b", arg3 = "c",
                   arg4 = "d")

My personal opinion is that an indentation rule shouldn't be impacted by something as arbitrary as the names we give to variables or functions, but a parameter might be needed to accommodate this popular style.

@dgkf dgkf marked this pull request as ready for review May 26, 2020 22:35
@dgkf dgkf requested a review from jimhester as a code owner May 26, 2020 22:35
@jimhester
Copy link
Member

following snippet will throw a Lint, despite it being a popular style

Either the default for this will have to change, or this new linter will not be able to be in the set of default linters.

The default linters adhere to the tidyverse style guide which indents the function arguments in this way.

@dgkf
Copy link
Author

dgkf commented May 28, 2020

A few questions after skimming the style guide. Sorry for the litany of stylistic options, but there are a lot of decisions wrapped into this linter and I want to make sure I'm capturing the right things.

  1. function call indentation

Just to double check before I spend too much time on implementation, in the tidyverse style guide this appears to only apply to the function header, but not to function calls. Is that correct?

  1. multiline bracket args treated the same as function calls?

I also dug around a bit in the style guide but didn't see how indentation should be handled for indexing operators whose arguments span to new lines. Following the function call style, I believe it should be indented by one indentation level.

# Good?
mtcars[
  mtcars$carb > 2,
  c("carb", "cyl")
]

# Bad?
mtcars[mtcars$carb > 2,
       c("carb", "cyl")]
  1. closing parenthesis on a multiline function call

In a few of the examples, functions are written so that the closing parenthesis of a function call is on a new line, returning to the parent indentation.

from 2.5 Long Lines

do_something_very_complicated(
  something = "that",
  requires = many,
  arguments = "some of which may be long"
)

However, as far as I can tell this notation isn't described anywhere as a rule. Right now closing parentheses aren't checked for indentation, but I could add it in and it would follow the same rule as closing curly-braces for parentheses whose parent expressions span multiple lines.

  1. commas as the start of a new line

A style that I see every now and then puts the comma between parameters at the start of a new line.

do_something_very_complicated(
    something = "that"
  , requires = many
  , arguments = "some of which may be long"
)

I don't see this anywhere in the style guide examples, but it doesn't appear to be explicitly disallowed. Currently my indentation linter ignores the commas and will flag the remaining expression as improperly indented even if the leading comma is properly indented. Should this style be accommodated?

@jimhester
Copy link
Member

  1. Yes
  2. Yes, [ are indented the same as normal function calls
  3. Yes, the closing parenthesis are indented to match the indentation of the call.
  4. We genenrally do not recommend this style.

@dgkf
Copy link
Author

dgkf commented Jun 20, 2020

Just committing to bring the WIP up to speed. I still have some work to do to get some of the parameters working as intended, but it's coming along.

Now implemented

  • tidyverse-style function headers: function parameters starting on a new line that aren't indented to the indentation of the function opening parenthesis will throw lints
  • closing parenthesis of multiline function calls not returning to the parent indentation will now throw lints

Not yet implemented

  • Generic function header indentation option, expecting parameters on new lines to be indented by one indentation level. This was working before, but need to sure it can be toggled via a parameter.
  • Perhaps for a different PR, but I'd like to also have a (non-default, as it doesn't conform to the tidyverse style) matching parenthesis/braces linter so that parenthesis/braces that start on the same line are paired with parenthesis/braces that close on the same line. This is implemented here, but I need to clean it up before adding it in.

@dgkf
Copy link
Author

dgkf commented Jul 23, 2020

@jimhester This PR is reaching a state that I'm pretty happy with.

Overview of available linting behaviors

Just to recap, the linter handles:

  • general indentation (child expressions should be indented by indent characters)
  • closing curly braces should return to the parent indentation level
  • optionally, and by default, multiline function call closing parens should return to the parent indentation level, satisfying the tidyverse style guilde
  • optionally, and by default, newlines of a multiline function parameter header should be indented to the same level as the opening paren, satisfying the tidyverse style guide. Alternatively, will throw lints if newlines are not indented by indent characters.

Unaddressed edge cases

I couldn't find any info in the tidyverse style guide for multiline IF, FOR or WHILE parenthetical expressions. I assume that these should either follow the function header style or the generic indentation style, but it doesn't appear to be specified anywhere. They are currently explicitly ignored.

# good/bad?
if (1 < 2 &
    3 < 4) {  # indented to opening parenthesis
  TRUE
}

# good/bad?
if (1 < 2 &
  3 < 4) {  # indented by <indent = 2L> characters
  TRUE
}

Test cases

I think I covered most situations with minimal examples and have tested it against larger files producing expected results. Please let me know if you see gaps in the testing - happy to add more as needed, especially if we can nail down an expected behavior for the IF/FOR/WHILE keyword syntax.

@dgkf
Copy link
Author

dgkf commented Sep 2, 2020

Most recent additions address the keyword syntax for IF, FOR and WHILE, allowing for either of the above proposed options.

@AshesITR
Copy link
Collaborator

AshesITR commented Sep 3, 2020

I wouldn't recommend the second style - this puts the trailing conditions on the same indentation level as the logic which looks very confusing to me.

@dgkf
Copy link
Author

dgkf commented Sep 3, 2020

Thanks @AshesITR - This behavior is currently just toggled by a parameter to the linter. I can certainly swap the default over the parenthesis indentation.

To that point, I've been trying to think of ways to make this more flexible. I think that splitting up the different indentation of different syntax elements into different linters so that they can be more fully parameterized and customizable. My only concern with this is that I derive quite a few new columns for the parsed content data.frame, and I would want to avoid recalculating those new columns for a handful of nearly identical linters. If there's a more appropriate place for these to all be calculated, I'd be happy to lift it out of the linter.

I'd hope to make the linters look more like this:

keyword_paren_expr_linter(anchor = "parenthesis", indent = 0L)
# # good
# while (i < 
#        3L) { ...   # indented 0 characters relative to opening parenthesis

keyword_paren_expr_linter(anchor = "parent", indent = 2L)
# # good
# while (i <
#   3L) { ...   # indented 2 characters relative to parent block

@AshesITR
Copy link
Collaborator

AshesITR commented Sep 3, 2020

How about an interface like

indentation_linter(
  indent = 2L,
  control_flow = c("hanging", "indent"),
  function_call = c("hanging", "indent"),
  index = c("hanging", "indent")
)

Which would each allow only the "hanging" style (anchor = "parenthesis") or the "indented" style, or both for one of the three types of paren expressions.

I don't think indent > 0L makes much sense for anchor = "parenthesis" since Ctrl + I auto-indent would clear that anyway.
Do you?

@dgkf
Copy link
Author

dgkf commented Sep 3, 2020

The reason I suggested having the indentation as a parameter to each syntax-specific indentation linter was because someone may want to do a python PEP8-style double-indent to break up indentation blocks:

if (x > 1L) { 
  # typical control-flow indentation is 2 characters
  var_one <- x
  foo <- long_function_name(
      # within function calls, an extra indentation level is used
      var_one, var_two, var_three,
      var_four)
}

If starting down the path of making this more flexible, it might be worth considering how it could support multiple valid indentation styles such that

# valid (hanging indent)
foo <- long_function_name(var_one
    var_two, var_three, var_four)

# valid (aligned to paren)
foo <- long_function_name(var_one, var_two,
                          var_three, var_four)

# invalid (only indented 1 level)
foo <- long_function_name(var_one
  var_two, var_two, var_four)

To accommodate this level of customization, I started playing around with functions as inputs to these arguments, but then felt like I was just writing another interface to add linters. At the time I hadn't considered possibly allowing multiple valid syntax. With that in mind, it makes more sense to handle that within the indentation_linter since it would need to handle this line-wise lint comparisons.

It might look something like this:

indent_scheme <- function(anchor = c("parent", "parenthesis"), indent_levels = 1L) {
  function(anchors, n_indent_chars) 
    anchors[[anchor]] + indent_levels * n_indent_chars
}

indent_double <- indent_scheme(indent_levels = 2L)
indent_paren <- indent_scheme("parenthesis", indent_levels = 0L)

indentation_linter(
  indent = 2L, 
  # single valid indent scheme provided as function
  control_flow = indent_paren,  
  # list of valid indentation schemes provided as list of functions
  function_call = list(indent_double, indent_paren)  
)

@AshesITR
Copy link
Collaborator

AshesITR commented Sep 3, 2020

Hmm I understand that you wish to provide high flexibility, but considering that all other linters have fairly simple (read atomic vector) configuration arguments, I'd favor a customisation interface more in line with the rest.

If I understand correctly, the linter needs the following configuration parameters:

  • number of spaces to indent by (default: 2L)
  • for each of the contexts control_flow, function_call and indexing, a list of acceptable indentation styles (default: parenthesis-aligned with 0 added indent levels + parent-aligned with 1 added indent level)

With atomic vectors, we could (ab)use their names for the anchor info, maybe defaulting to "parent".
That would produce:

tidyverse-style = default args:
indentation_linter(indent = 2L, control_flow = c(1L, "hanging" = 0L), function_call = c(1L, "hanging" = 0L), index = c(1L, "hanging" = 0L))

your example style:
indentation_linter(indent = 2L, control_flow = c("hanging" = 0L), function_call = c(2L, "hanging" = 0L), index = ???)

This still looks a bit overwhelming to me...

@jimhester What's your opinion on a possible configuration interface?

@dgkf
Copy link
Author

dgkf commented Nov 9, 2020

@AshesITR @jimhester

Resurrecting this PR since I haven't heard back in a while. Were there any thoughts on a interface direction here? I'm happy to continue working on this, but I'd prefer to get some feedback first.

@jimhester
Copy link
Member

I generally think less is more in terms of configuration options, at least on the public facing interfaces.

Maybe just having two wrapper functions with the most likely defaults would be best?

@kpagacz
Copy link
Contributor

kpagacz commented Dec 31, 2020

Are the wrappers everything that is missing from this PR to be considered for merging or is there more to do here? @jimhester @dgkf

@dgkf
Copy link
Author

dgkf commented Jan 1, 2021

@kpagacz I'm still chipping away at this. In the process of polishing it up I realized that header indentation was only being handled at the top level of header expressions. I've made some progress toward making it more generalized, but it was a pretty big shift from the current implementation.

I got a bit derailed through the holiday season, but plan to get this into shape when I have some extra time to revisit it again.

@MichaelChirico
Copy link
Collaborator

Related issue is #63

@MichaelChirico
Copy link
Collaborator

Thanks again for your work here @dgkf !! Please don't hesitate to provide review comments / notes in the successor PR. Closing this now to tidy up the PR queue.

MichaelChirico added a commit that referenced this pull request Oct 17, 2022
* implement indentation_linter

supersedes #497

* xml2::*

* fix tests

* fix range start (off by one)

* ignore indentation level within string constants

* fix lints and false positives, handle nested indentation changes

* de-lint package, explicit branch for no lints

* adapt indent suppression condition

* change indentation end path for $function calls

* return non-empty file_lines even if no R source is present

* ensure Rmd is detected

* .Rmd

* add use_hybrid_indent = TRUE

* except hanging indent if following expr is a code block

* smarter hanging indent detection for use_hybrid_indent

* suppress redundant lints

* update tests

* de-lint

* move around NEWS

* if condition styling

* use TODO for consistency/searchability

* address review comments

* NEWS wording

* add more tests, refactor to file-level because of a systematic problem with expression level

* refac use_hybrid_indent -> hanging_indent_style

* allow compare_branches.R to use custom parameters if desired

* improve docs

* de-lint

Co-authored-by: Michael Chirico <chiricom@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants