[raw-string] indentation stripping

Kevin Bourrillion kevinb at google.com
Tue May 1 20:46:41 UTC 2018


(None of this has any bearing on single-line RSLs, which work great and
need no changes at all. "Raw means raw", all the way.)

I've proposed we adopt an assumption, which I was calling the "rectangle
rule", which really just says: we assume most users will prefer to see
*relative
indentation preserved exactly* between the source code and runtime forms of
their strings.

That is, I think we can assume that users will *not* want to read or write
this:

  String one = `+-----------------+
| a nice neat box |
+-----------------+`;

nor this:

  String two = `+-----------------+
      | a nice neat box |
      +-----------------+`;

nor even this:

  String three =
`+-----------------+
| a nice neat box |
+-----------------+`;

but that the universe of styles they will consider might include this:

  String four = `+-----------------+
                 | a nice neat box |
                 +-----------------+`;

or this:

  String five = `
      +-----------------+
      | a nice neat box |
      +-----------------+
  `;

or this:

  String six = `
+-----------------+
| a nice neat box |
+-----------------+`;

... or various others, but common to all, I claim that users will want to
see... well, a nice neat box.

If we agree with this principle, then continuing with "raw means raw"
creates a big problem: everything but style four gives you a leading
newline in your actual constant, which is rarely wanted, and not even
trivial to remove. And style four is itself problematic, because you may
overflow your right margin, and because refactorings will break the
alignment you wanted once again.

So *I'm hoping to get agreement* on this question: even if we do nothing
else, should we chop a leading newline (after ignoring other leading
whitespace), if present? For the reasons just explained I think the answer
should be yes.

(My main project here is to get to a fair comparison between the
"raw-means-close-to-raw" and the "automatic indentation stripping"
alternatives. All I'm hoping to do here is to stop this leading newline
issue from muddying that comparison, by establishing that we would strip
that either way. Note that symmetric arguments do not apply for trailing
newlines; in fact, for the more-raw alternative we *must not* strip those.)


On Sat, Apr 28, 2018 at 11:12 AM, Brian Goetz <brian.goetz at oracle.com>
wrote:

> This thread accidentally got started on the wrong list, so bringing it
> back here.  The following messages are hereby read into the record (and
> hence can be considered to be under the proper terms of use for a
> specification list.)
>
> http://mail.openjdk.java.net/pipermail/amber-dev/2018-April/003034.html (Jim
> #1)
> http://mail.openjdk.java.net/pipermail/amber-dev/2018-April/003035.html
>  (John)
> http://mail.openjdk.java.net/pipermail/amber-dev/2018-April/003051.html (Kevin
> #1)
> http://mail.openjdk.java.net/pipermail/amber-dev/2018-April/003052.html (Jim
> #2)
> http://mail.openjdk.java.net/pipermail/amber-dev/2018-April/003053.html (Kevin
> #2)
> http://mail.openjdk.java.net/pipermail/amber-dev/2018-April/003054.html (Jim
> #3)
>
> A summary follows.
>
> The key point being discussed is that the “raw means raw” interpretation
> for multi-line strings is likely to be at odds with how users actually plan
> to use the feature — that they will pad the code with incidental
> indentation to make it line up nicely with the enclosing Java code, and
> that IDEs may well adjust said incidental indentation as the code is
> maintained — and that this is a reasonable thing to encourage.  Kevin’s
> data from the Google codebase backs up this supposition.  Our design
> already admits this to some degree — for multi-line strings, we don’t
> really believe the source file when it uses platform-specific line
> terminators.  So we’re trying to distill how to distinguish “incidental”
> indentation from intended indentation in multi-line strings.  (More
> generally: the feedback we’ve gotten is that while raw strings is the right
> design center for single-line strings, when it comes to snippets that span
> lines, user care more about multi-line-ness than raw-ness.)
>
> Assumptions:
>  - Most multi-line strings will be code snippets of some sort (JSON, XML,
> SQL, Java, etc);
>  - Most developers will want to use incidental indentation to have code
> snippets indent “sensibly” relative to neighboring Java code, but said
> incidental indentation is not part of the snippet.
>
> Jim’s #1 offers a catalog of ways in which users might craft multi-line
> string literals to fit cleanly into their source code, identifying which
> indentation is incidental and which is essential.
>
> To the goals, I’d add:
>
>  - In addition to it being _possible_ to render the desired result, it
> should be straightforward for users to _predict_ the result of indentation
> stripping.
>
> Kevin adds: it would be useful if we could draw a “rectangle” that
> excludes all incidental indentation and includes all intended indentation.
>
> Tabs are a confounding issue; since there is no standard interpretation
> for how many spaces correspond to a tab, in the general case no trimming
> algorithm will do well with mixed spaces and tabs.  However, in the
> well-behaved case where lines begin with tab* space*, a common prefix can
> be stripped.
>
> There’s some reason to believe that calling .stripIndent() will be so
> common that it should be the default, rather than requiring users to invoke
> it every time.
>
> Now back to discussion.
>
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20180501/1e742faa/attachment-0001.html>


More information about the amber-spec-experts mailing list