String reboot - (1a) incidental whitespace

Brian Goetz brian.goetz at oracle.com
Tue Apr 16 18:21:12 UTC 2019


Let me extract some observations, and questions, from JIm’s mail here.  

I think the alignment algorithm Jim is driving towards is something like this:

 - Strip up to one leading and trailing blank line
 - Align the remaining lines by removing the largest common whitespace prefix from each [1]
 - If the trailing line was blank, add back an ending newline

In Jim’s examples, all of a, b, c, d, e, f, g would produce a left-justified text rectangle (textangle?); g would not have a trailing newline.  

For h, we would get 

+--------+
              |  text  |
              +--------+

because the text on the first line counts as text.  (In this way, any ML string with text in the first column will have effectively opted out of alignment, since there is no leading common whitespace prefix — though I would not intend for this to be the only opt-out mechanism.). 

[1] We can require an exact match (tabs for tabs), or an inexact one (each WS character counts as one whitespace)

With respect to using text column position as part of the algorithm, I don’t really think we can justify this; it’s too weird and different.  Secondarily, there is a lot of value to the language interpretation of alignment (if any) and the library interpretation (String::align) to be the same; this allows complex cases that are built up via concatenation to opt out on the component strings, build a composition string, and then explicitly align with the library mechanism.  I think this is largely a forced move.

The examples with concatenation are ugly no matter how we slice them (at least not without interpolation, which is most definitely not on the menu right now.). I don’t think any auto-align mechanism is going to work well on them, but there are two ways to get the desired result:

 - Use concatenation on opted-out strings, and then wrap the whole thing with (…).align()
 - Use String.format (we can provide an instance version) on a single, aligned format string, such as:

    String s = “””
         Name: %s
         Age: %d”””.formatted(name, age);

The high-order question is: 

 - Do we think that this alignment algorithm produces the desired answer enough of the time, and is easy enough to reason about, to be the default (assuming an opt-out mechanism), or should we prefer explicit alignment all the time?





> On Apr 10, 2019, at 11:22 AM, Jim Laskey <james.laskey at oracle.com> wrote:
> 
> Next plate is (1a) incidental whitespace.
> 
> Having decided that we are content with "fat" delimiters (""") for multi-line strings, we have some more choices to make regarding multi-line strings.  (We're not going to talk about "raw" strings yet; let's finish the multi-line course first.)
> 
> Multi-line strings are different from single-line strings in a number of ways, so let's get clear on what we want "multi-line" to mean.
> 
> Line terminators:  When strings span lines, they do so using the line terminators present in the source file, which may vary depending on what operating system the file was authored.  Should this be an aspect of multi-line-ness, or should we normalize these to a standard line terminator?  It seems a little weird to treat string literals quite so literally; the choice of line terminator is surely an incidental one.  I think we're all comfortable saying "these should be normalized", but its worth bringing this up because it is merely one way in which incidental artifacts of how the string is embedded in the source program force us to interpret what the user meant.  Which brings us to the next incidental aspect...
> 
> Whitespace:  A multi-line string is nestled in the context of a Java source program.  It is likely (though not guaranteed) that the indentation of lines has been distorted by the desire to make the embedded snippet align with the enclosing lines.  Most of the time, there is some combination of incidental whitespace and intended whitespace.  There are a number of algorithms by which we could try to intuit which the user intended.  Which brings us to ask:
> 
>  - Assuming the existence of a reasonable algorithm for re-aligning text, what should the _default_ be for the language? Should it assume the user wants re-alignment, or make the user explicitly opt in?
>  - If the choice is "automatically align", how would we indicate the desire to opt out?
>  - Should we limit what we do automatically to only what can be done by an equivalent library routine?
> 
> (Again, let's focus on the requirements and semantics and defaults first, before we bikeshed the syntax.)
> 
> Its hard to answer the above without a clear understanding of the use cases.  So, here's a partial catalog of examples; let's play "what was the user thinking", and see if we can agree on that.
> 
> Examples;
> 
> String a = """
>            +--------+
>            |  text  |
>            +--------+
>            """; // first characters in first column?
> 
> String b = """
>                +--------+
>                |  text  |
>                +--------+
>            """; // first characters in first column or indented four spaces?
> 
> String c = """
>                +--------+
>                |  text  |
>                +--------+
> """; // first characters in first column or indented several?
> 
> String d = """
>     +--------+
>     |  text  |
>     +--------+
> """; // first characters in first column or indented four?
> 
> String e =
> """
> +--------+
> |  text  |
> +--------+
> """; // heredoc?
> 
> String f = """
> 
> 
>                +--------+
>                |  text  |
>                +--------+
> 
> 
>            """; // one or all leading or trailing blank lines stripped?
> 
> String g = """
>               +--------+
>               |  text  |
>               +--------+"""; // Last \n dropped
> 
> String h = """+--------+
>               |  text  |
>               +--------+"""; // determine indent of first line using scanner knowledge?
> 
> String i = """  "nested"  """; // strip leading/trailing space?
> 
> String j = ("""
>                  public static void """ + name + """(String... args) {
>                      System.out.println(String.join(args));
>                  }
>            """).align(); // how do we handle expressions with multi-line strings?
> 
> String k = """
>                  public static void %s(String... args) {
>                      System.out.println(String.join(args));
>                  }
>            """.format(name); // is this the answer to  multi-line string expressions?
> 
> As we can see, there were a lot of cases where the user _probably_ wanted one thing, but _might have_ wanted another.  What control knobs do we have, that we could assign meaning to, that would let the user choose either way?  Candidates include:
> 
>  - The opening line (is it blanks followed by a newline, or are there non-whitespace characters?)
>  - The position of the close delimiter (is it on its own line, or not?)
> 
> Similarly, we have a number of policy choices:
> 
>  - Do we allow content on the same lines as the delimiters?
>  - Should we always add a final newline?
>  - Should we strip blanks lines?  Only on the first and last?  All leading and trailing?
>  - How do we interpret auto-alignment on single-line strings? Strip?
>  - Should we right strip lines?
> 
> And some syntax choices (not to be discussed now):
> 
>  - How do we indicate opt-out?
> 
> Comments?
> 
> 
> Examples narrative.  Don’t peek yet.  Stop and comment first.
> 
> 
> Unlike most other Java constructs, multi-line strings force us to look at coding style "square on".  Keep in mind that we are often guilty of making assumptions about developer coding style.  For instance, we may assume that multi-line strings tend to be large elements.  We may also assume that developers will declare static final String variables to keep multi-line strings from messing up their code.  All very neat and tidy, but...  we know from experience that developers will use multi-line strings everywhere, as they have with array initialization and large lambda bodies.
> 
> From this, we recommend that multi-line string fat delimiters should follow the brace pattern used in array initialization, lambdas and other Java constructs. The open delimiter should end the current line.  Content follows on separate lines, indented one level.  The close delimiter starts a new line, back indented one level, followed by the continuation of enclosing expression.
> 
> So as in this brace pattern;
> 
> int[] ia = new int[] {
> 	1,
> 	2,
> 	3
> };
> 
> we have the fat delimiter pattern;
> 
> String d = """
>     +--------+
>     |  text  |
>     +--------+
> """;
> 
> and;
> 
> String.format("""
>      public static void %s(String... args) {
>          System.out.println(String.join(args));
>      }
>  """, name);
> 
> The fat delimiter pattern also significantly helps with future editing in and around the multi-line string.  For example, changing the length of the variable name in the above "String d =" example doesn't affect the positioning of the string content or the close delimiter.
> 
> If we adopt this style, some of the answers to the incidentals questions become easier or even moot.  Other styles are still valid, but the result of automatic incidental handling may be surprising.
> 
> Note that fat delimiters can be used on single lines.  What are the semantics for auto-alignment in that case?  The question of stripping whitespace and newlines is not really about alignment.  It's about what are the rules for handling incidental characters in a fat delimiter string.
> 
> 
> Continuing with the examples, let's assume some (negotiable) auto-alignment basic rules;
> 
> 1. All content lines are uniformly right stripped. Whitespace at the end of lines is not something that is consistently managed by IDEs/editors.
> 2. End of lines are always translated to \n.
> 3. If the content after the open delimiter is empty then the first end of line is discarded.
> 4. Content is left justified while preserving relative indentation.
> 
> And as a reminder, in the last round we introduced or attempted to introduce the following String methods;
> 
> - String::indent(n) - used to change indentation, line by line (in JDK 11)
> - String::align() and String::align(n) - used to manage incidental indentation (didn't make it)
> - String::format as an instance method (resolution issues YTBD)
> 
> __________________________________________________________________________________________________
> String a = """
>            +--------+
>            |  text  |
>            +--------+
>            """; // first characters in first column?
> 
> RESULT:
> +--------+\n
> |  text  |\n
> +--------+\n
> 
> The problem with this example is that it is not following the fat delimiter pattern.  Let's change the variable name "a" to "something".
> 
> String something = """
>         .......... +--------+
>         .......... |  text  |
>         .......... +--------+
>         .......... """; // first characters in first column?
> 
> The "." indicate all the places where we had to add whitespace to maintain the pattern used.
> __________________________________________________________________________________________________
> String b = """
>                +--------+
>                |  text  |
>                +--------+
>            """; // first characters in first column or indented four?
> 
> RESULT:
> +--------+\n
> |  text  |\n
> +--------+\n
> 
> Same maintenence problem as example (a).
> 
> Still works, but the question here is, do we give meaning to indentation relative to the close delimiter? Did we want?;
> 
>     +--------+\n
>     |  text  |\n
>     +--------+\n
> 
> It's a nice trick but we sabotage the fat delimiter pattern.  We would always get at least one level of indentation, whether we wanted it or not.  Maybe better to code as;
> 
> String b = """
>     +--------+
>     |  text  |
>     +--------+
> """.indent(4);
> 
> So the question here is: should it be possible to specify "extra" indentation through the positioning of quotes, or are we better off saying that any extra indentation should be done through library calls?  Also noting that the library calls might be subject to compile time folding.
> __________________________________________________________________________________________________
> String c = """
>                +--------+
>                |  text  |
>                +--------+
> """; // first characters in first column or indented several?
> 
> RESULT:
> +--------+\n
> |  text  |\n
> +--------+\n
> 
> The amount of indentation is not a problem, just an aesthetic issue.
> 
> __________________________________________________________________________________________________
> String d = """
>     +--------+
>     |  text  |
>     +--------+
> """; // first characters in first column or indented four?
> 
> RESULT:
> +--------+\n
> |  text  |\n
> +--------+\n
> 
> Text book fat delimiter pattern.
> __________________________________________________________________________________________________
> String e =
> """
> +--------+
> |  text  |
> +--------+
> """; // heredoc?
> 
> RESULT:
> +--------+\n
> |  text  |\n
> +--------+\n
> 
> Just an aesthetic issue.
> __________________________________________________________________________________________________
> String f = """
> 
> 
>                +--------+
>                |  text  |
>                +--------+
> 
> 
>            """; // one or all leading or trailing blank lines stripped?
> 
> As-is would generate;
> \n
> \n
> +--------+\n
> |  text  |\n
> +--------+\n
> \n
> \n
> \n
> 
> If we stripped away all leading or trailing blank lines, we would then have code as;
> 
> String f = "\n".repeat(2) + """
>     +--------+
>     |  text  |
>     +--------+
> """ + "\n".repeat(2);
> __________________________________________________________________________________________________
> String g = """
>               +--------+
>               |  text  |
>               +--------+"""; // Last \n dropped
> 
> RESULT:
> +--------+\n
> |  text  |\n
> +--------+
> 
> This one is likely okay. It's not the fat delimiter pattern, but the oddity makes it clear we mean something different; we want to drop the last \n.
> __________________________________________________________________________________________________
> String h = """+--------+
>               |  text  |
>               +--------+"""; // determine indent of first line using scanner knowledge?
> 
> RESULT:
> +--------+\n
> |  text  |\n
> +--------+
> 
> We can do this because the compiler's scanner can determine the indentation on the open delimiter line.  However, this one is problematic if we require a String method to duplicate the compiler's algorithm (String::align).  Tool vendors may also find this one problematic.
> __________________________________________________________________________________________________
> String i = """  "nested"  """; // strip leading/trailing space?
> 
> RESULT:
> "nested"
> 
> This one still follows the rules; left and right stripped.
> __________________________________________________________________________________________________
> String j = ("""
>                  public static void """ + name + """(String... args) {
>                      System.out.println(String.join(args));
>                  }
>            """).align(); // how do we handle expressions with multi-line strings?
> 
> Mid-string substitution gets messy fast.  Let's break the example down to the following (without align.)
> 
> String j = """
>                  public static void """ + name + """(String... args) {
>                      System.out.println(String.join(args));
>                  }
>            """;
> 
> This is the same as
> 
> String j =
> """
>     public static void """
> + name +
> """(String... args) {
>         System.out.println(String.join(args));
>     }
> """;
> 
> Which works fine if we say no \n when close delimiter is on the same line. The other requirement is there is that each multi-line string componment ends up with a common indentation.  The odds of that happening are poor.
> 
> Guess we're stuck with parentheses String::align. Unless...
> __________________________________________________________________________________________________
> String k = """
>                  public static void %s(String... args) {
>                      System.out.println(String.join(args));
>                  }
>            """.format(name); // is this the answer to  multi-line string expressions?
> 
> RESULT:
> public static void methodName(String... args) {
>     System.out.println(String.join(args));
> }
> 
> Maybe a better substitution solution.
> __________________________________________________________________________________________________
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20190416/cbb939a2/attachment-0001.html>


More information about the amber-spec-experts mailing list