Skip to content

Introduce linter for deeply nested code? #1848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
IndrajeetPatil opened this issue Dec 15, 2022 · 5 comments
Open

Introduce linter for deeply nested code? #1848

IndrajeetPatil opened this issue Dec 15, 2022 · 5 comments

Comments

@IndrajeetPatil
Copy link
Collaborator

Preamble

There is a substantial body of literature showing that deeply nested code increases complexity and is difficult to understand. This code can almost always be redesigned to be simpler (for a recent demo, see this video). It would be good if we can provide a linter that lints functions that nest deeper than the specified threshold.

I realize that this sounds similar to cycloclomp_linter(), but I'd argue that the proposed linter will be slightly different. E.g., a function, especially a small one, can have a McCabe’s complexity lower than the specified threshold (say 15) and yet have nesting deeper than the specified threshold.

Default

The book below reports that the ability of programmers to comprehend a loop deteriorates significantly beyond three levels of nesting, so maybe that can be the default.

Yourdon, Edward (1986). Managing the Structured Techniques: Strategies for Software Development in the 1990s, 3d ed. New York, NY: Yourdon Press.

That said, I am not sure how outdated this literature is or if there are newer studies on this topic that suggest a different default.

Example

Here is an example I came across in a closed-source project. I have removed all the code to remove any way to identify the project and also to lay bare the nested structure of the code. It'd be nice if linter could detect code like this and flag it for redesign/refactor.

foo <- function() {
  # ...
  
  for (grade in grades) {
    # ...
    
    for (subject in subjects) {
      # ...
      
      if (nrow(subject_data) > 0) {
        # ...
        
        for (file in files) {
          # ...
          
          if (file.exists(local_doc)) {
            # ...
          } else {
            # ...
          }
          
          # ...
          
          if (file.exists(local_docx)) {
            # ...
            
            if (isTRUE(delete_doc)) { 
              # ...
            }
            
            # ...
          }
        }
      }
    }
  }
}
@MichaelChirico
Copy link
Collaborator

MichaelChirico commented Dec 15, 2022 via email

@MichaelChirico
Copy link
Collaborator

In this particular case, cyclocomp_linter() will suffice.

@AshesITR
Copy link
Collaborator

AshesITR commented Oct 4, 2023

Re nesting we should be careful with R6Classes.
That is one case where cyclocomp also quickly fails to provide helpful lints simply because the complexity of all class methods seems to be summed.

Here is a trivial class which reaches 4 levels of nesting:

MyClass <- R6::R6Class(
  "MyClass",
  public = list(
    hello = function(x) {
      if (missing(x)) {
        x <- "unknown"
      }
      paste0("Hello, ", x, "!")
    }
  )
)

@MichaelChirico
Copy link
Collaborator

See #2302 for some initial checks on overly-nested code.

@IndrajeetPatil
Copy link
Collaborator Author

I am not sure why I marked this as closed because we still don't have a way to lint deeply nested code.

Here is a good example to showcase how code can be deeply nested and still pass the McCabe complexity threshold:

library(lintr)

lint(linters = cyclocomp_linter(5L), "
if (isTRUE(a))
  if (isTRUE(b))
    if (isFALSE(c))
      if (isFALSE(d))
        i <- 1L
")
#> ℹ No lints found.

Created on 2024-10-28 with reprex v2.1.1

This code should produce lint with this new linter.

It can be refactored to avoid nesting, which would make it more readable:

# option-1
if (isTRUE(a) && isTRUE(b) && isFALSE(c) && isFALSE(d)) {
  i <- 1L
}

# option-2
should_set_i <- function(a, b, c, d) {
  isTRUE(a) && isTRUE(b) && isFALSE(c) && isFALSE(d)
}

if (should_set_i(a, b, c, d)) {
  i <- 1L
}

# option-3
if (all(isTRUE(a), isTRUE(b), isFALSE(c), isFALSE(d))) i <- 1L

# option-4
if (!isTRUE(a)) return(NULL)
if (!isTRUE(b)) return(NULL)
if (!isFALSE(c)) return(NULL)
if (!isFALSE(d)) return(NULL)

i <- 1L

# etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants