Skip to content

Wrap multiline string literals to line length. #792

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 19, 2024

Conversation

thunderseethe
Copy link
Contributor

@thunderseethe thunderseethe commented Aug 13, 2024

Modify the pretty printer to insert escaped newlines "\\n" when a line would exceed configured line length. This only affects multi-line strings and not single-line strings. Previously string segements in multiline literals were treated as one large .syntax token. They are now broken up into multiple tokens to allow opportunities to line break.

A string segement "the quick fox" will now be comprised of the tokens:

[.syntax("the"), .syntax(" "), .break(.escaped), .syntax("quick"), .syntax(" "), .break(.escaped), .syntax("fox")]

we now break up the segment based on whitespace. After each series of whitespace, we insert the new .escaped break.
A couple of odd things to note here:

  • A new NewlineBehavior: .escaped
  • Tokens include both the whitespace and the line break
  • We emit whitespace as a .syntax and not a .space

.escaped is very similar to an .elective line break. It will only break when it exceed the line length, otherwise it will be printed as size spaces. Unlike .elective, when it is printed it will print as an escaped newline "\\n". On the implementation side we also have to calculate it's length differently than we do for .elective.

Because all whitespace in a string literal is significant (unlike whitespace in other swift syntax), it is emitted alongside the break. This is because, unlike other breaks, we always want to print the whitespace in a string literal. The choice to print the whitespace prior to the line break is cosmetic.

Whitespace is emitted as a .syntax to avoid special casing between .breaks and .spaces. The pretty printer will ignore spaces (or change them) around line breaks to make code look better. However this doesn't work for string literals where the whitespace is significant. Emitting .syntax token ensures all whitespace in the literal shows up.

For example, the literal:

"""
Lorem ipsum odor amet, consectetuer adipiscing elit. Vehicula aptent per convallis fermentum mus. Rutrum ultrices nisl etiam et volutpat dictum pulvinar elementum felis.
"""

will be formatted as:

"""
Lorem ipsum odor amet, consectetuer adipiscing \
elit. Vehicula aptent per convallis fermentum mus. \
Rutrum ultrices nisl etiam et volutpat dictum \
pulvinar elementum felis.
"""

Wrapping will reflow escaped newlines already present in the string literal. For example, with a line length of 30:

"""
multi-line \
string.
"""

will be formatted as:

"""
multi-line string.
"""

However hard line breaks are always respected, since they appear in the string literal itself not just the syntax. So:

"""
multi-line
string
"""

will not be reformatted.

Breaks are only inserted after whitespace, so words longer than line length will not be wrapped (line length 30):

"""
long word: 01914cc5-a274-7df4-b88b-5e167f3f5662
"""

will be formatted as

"""
long word: \
01914cc5-a274-7df4-b88b-5e167f3f5662
"""

even though the second line exceeds 30 characters.

Fixes #724

Comment on lines +619 to +626
/// We want this to print as:
/// """
/// some text this is exactly the right length
/// """
/// and not:
/// """
/// some text \
/// this is exactly the right length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since backslashes are used to suppress newlines, I think I agree with this choice. I'll admit it's somewhat opinionated though. I don't know if other folks would want to be able to choose where in a string they break it. If we made this configurable, we could allow that, but the problem with doing that is that once a string is broken in this way, it can never be automatically unbroken, so shifting indentation could cause a string to split awkwardly.

@ahoppen @bnbarham Do either of you have opinions here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally would not automatically re-flow string literals. Most likely a user already put thought into the re-flow. If you do want to re-flow, I personally think it’s not too complicated to change the string literal to be on a single line again and then let swift-format break it again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely a user already put thought into the re-flow

The main case that this isn't true is what @allevato mentioned, ie. format causes a split, now you indent for some other reason (nest in an if/whatever), and now never automatically reflow.

I personally think it’s not too complicated to change the string literal to be on a single line again

Could have a refactoring for this pretty easily too.


I'm somewhat split in my opinion as I can see benefits in both. I definitely think we need to have an option (probably defaulted to off) if we do reflow though. Maybe // swift-format-ignore is enough for cases when it's on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an option to configure multiline string reflow behavior. When false, existing escaped newlines are left in place and string reflow does not occur. When true, existing escaped newlines are removed and string reflows using the inserted .escaped break tokens.

The option defaults to false, so reflow will not occur by default and will be opt-in. // swift-format-ignore works regardless of this setting, so folks can turn on reflow and still keep escaped newlines for specific literals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke with @allevato and extended the reflow config behavior to include 3 options: never, onlyLinesOverLength, and always.

  • never behaves the same as today, we never break up multiline strings.
  • onlyLinesOverLength respects existing escaped line breaks but will break up lines over the line length.
  • always does not respect existing escaped line breaks. It will remove existing escaped line breaks and reflow everything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After chatting offline, we decided on the above and a couple other changes:

  1. Keep the default mode as never, to preserve existing behavior (don't break or reflow strings)
  2. Even when reflowing is onlyLinesOverLength or always, only do it for regular multiline strings ("""), never for raw multiline strings (#""", etc.)

The rationale for (2) is that it's highly likely that raw multiline strings are used for things like structured data, code, etc., whereas regular multiline strings are more likely to be simple prose.

That's not always the case, of course; swift-syntax and swift-format both have a mixture of code snippets in regular and raw multiline strings. But if we wanted to change the default behavior for this later, we could update those so that they format correctly in each case.

@ahoppen @bnbarham Do you think this is a reasonable landing spot for the time being?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me! Also CC @shawnhyam

Copy link
Member

@allevato allevato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me. I'll go ahead and kick off CI, but let's wait for other folks to comment on it as well.

@allevato
Copy link
Member

allevato commented Aug 14, 2024

It looks like this is triggering a bunch of whitespace linter issues when it does the format pass over swift-syntax. Can you look at what it's doing there and see if those are expected? https://ci.swift.org/job/swift-syntax-PR-macOS/4470/console

I suspect most of these are coming from the multiline strings that contain Swift code fragments, like this: https://github.com/swiftlang/swift-syntax/blob/f8b635392690fe59c74064ce222e134335cf22d0/CodeGeneration/Sources/generate-swift-syntax/templates/swiftsyntax/SyntaxRewriterFile.swift#L165

That's an interesting case that we need to consider, because this is definitely a scenario where we may want to control where newlines go via \ but also not want other parts of the code to rewrapped (or we may need the multiline string contents to exceed the overall column limit because the string is indented but we're generating code that we also want to expand out to or past the column limit). This is a tricky situation to deal with...


One idea would be to only reflow strings where every line ends in a \. That would likely exclude cases like the ones above, but it might also exclude a lot of valid cases (like two very long lines of English prose). I'm definitely open to better ideas.

@grynspan
Copy link

Something to take into consideration is that ASCII spaces between words are not universal across scripts. Most (all?) Latin-based scripts use them, but Chinese and Japanese writing rarely do, and some scripts use alternate characters (interpuncts, etc.) instead of ASCII spaces.

@allevato
Copy link
Member

allevato commented Aug 14, 2024

Something to take into consideration is that ASCII spaces between words are not universal across scripts. Most (all?) Latin-based scripts use them, but Chinese and Japanese writing rarely do, and some scripts use alternate characters (interpuncts, etc.) instead of ASCII spaces.

Good point. At a minimum, the logic on this line should preserve the original scalar detected by isWhiteSpace, rather than replacing it with a hardcoded ASCII space.

We could consider other kinds of characters to break on later; unfortunately, we didn't include any of the Unicode breaking properties in Unicode.Scalar.Properties from what I can tell.

@shawnhyam
Copy link
Contributor

One idea would be to only reflow strings where every line ends in a \. That would likely exclude cases like the ones above, but it might also exclude a lot of valid cases (like two very long lines of English prose). I'm definitely open to better ideas.

The case of preformatted code (or any preformatted text) came to mind for me -- this is possibly one solution, although I'm not convinced I would want it to reformat these blocks at all. I guess if I was used to it, it might be fine. But my feeling is this should be controlled by an option so it could at least be turned off if it's creating problems.

@allevato
Copy link
Member

Code in multi-line strings is the use case that stands out as most compelling to address to me.

I'd love to see any examples that someone might have where they have non-code text that is split across multiple lines terminated by \ where reflowing those lines arbitrarily would harm readability.

I guess one example would be other kinds of structured text:

let x = """
  value1: \(value1), \
  value2: \(value2), \
  value3: \(value3), \
  value4: \(value4)
  """

which could reflow like this, depending on line width:

let x = """
  value1: \(value1), value2: \(value2), value3: \
  \(value3), value4: \(value4)
  """

I think the argument could be made that the second form is worse.

It should be fairly straightforward to make something like // swift-format-ignore: MultilineStrings work, but it would be even better if we can do the right thing in most cases without intervention.

@grynspan
Copy link

Well now we're getting dangerously close to including swift-syntax as a dependency just so we can format code-in-block-strings. 🙃

@allevato
Copy link
Member

swift-evolution proposal to add descriptive labels to multiline strings, à la Markdown code fences?

let code = """swift
  func \(name)() -> \(result) {
  }
  """

🤔

Of course, my preference would be to do something good here without modifying the language 😅

@grynspan
Copy link

I mean, why not? Pitch the triple-backtick syntax! Do it!

@thunderseethe
Copy link
Contributor Author

I took a look at a couple alternatives for respecting user inserted "\\n" while reflowing formatter inserted "\\n". I couldn't find a compelling solution. As a stopgap solution I've added support for // swift-format-ignore on multi-line strings. This will disable reflowing within the string entirely, preserving user inserted "\\n" where relevant.

Change how PrettyPrinter emits StringSegmentSyntax's so that they can be broken up over multiple lines within multiline string literals. This does not change the behavior of single line string literals, which cannot contain newlines. This also does not change interpolations which are still emitted as verbatims and are never line broken by the pretty printer. Line breaking is done exclusively through escaped newlines. A literal containing escaped newlines will remove all escaped newlines and then reinsert them based on line length. Hard newlines are respected by the formatter and will not be moved, even if it causes short lines. Escaped newlines will be reformatted by the pretty printer so that lines ending in an escaped newline are at line length.

Wrapping is implemented by introducing a new newlinebreak behavior `.escaped`. `.escaped` acts very similarly to `.elective` but has slightly different length calculation logic and printing behavior. An escaped newline is printed including it's preceeding whitespace followed by "\\\n". So a break of `.break(_, 2, .escaped)` will print as "  \\\n". Because an escaped line break takes up characters when broken (unlike other breaks), length calculation must be handled differently for `.escaped` breaks.
@allevato allevato merged commit 89ccc79 into swiftlang:main Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Split string literals into multiple lines
6 participants