Long line string literals
John Rose
john.r.rose at oracle.com
Fri May 10 04:46:35 UTC 2019
On May 9, 2019, at 7:34 AM, Jim Laskey <james.laskey at oracle.com> wrote:
>
> How does a Java developer express a very long string?
> …
>
> Some of the parameters;
>
> - The solution needs to be an escape sequence(s). This is the only
> mechanism we can introduce (now) and be backward compatible with
> traditional string literals. Other mechanisms, such as literal
> prefixing, are not open for discussion at this point in time. (+1)
+1 from me
>
> - A Multi-line String Literal JEP goal is to make all escape sequences
> equally meaningful for traditional string literals and multi-line
> string literals. (+1)
+1
> - \<LineTerminator>, \<Space> and \<WhiteSpace> (white space includes LF
> and CR) have been proposed with various semantics for each. There is a
> concern that the lack of visibility for what comes after the \. Is it a
> space, tab, unicode white space, LF or CR? How do you tell? (?1)
Yep. Also note that some source control systems (ours!)
forbid trailing spaces before EOL in code, precisely because
they are invisible. IMO this consideration immediately disqualifies
<\ space> as a candidate for an escape sequence. <\ LT> is
still just fine, and maybe <\ LT space*> is tolerable, but
not if it means something different from <\ LT>.
> - When the new escape sequence(s) is in a traditional string literal the
> compiler scanner needs to treat the traditional string literal as
> multi-line. (-1)
Yes: If you use a <\ LT> escape sequence in a thin string,
it becomes a ML string. If you thought that only fat strings
could be ML strings, I've got a nice puzzler for you.
The reality of about fat strings is they are nicely formatted
multi-line strings (with the rectangle extraction feature).
> The escape sequences suggested differ, but they are all variations of
> consuming the escape and zero to N characters after (or before).
I'll say up front that greedily gobbling whitespace characters
either before or after an escape is a powerful idea, IMO,
because it allows the user to designate an ad hoc run of
whitespace as "program format only, but not payload".
If we make the ad hoc run easy to use, to make the program
more readable, we win, as with the rectangle rule.
But there has to be a way to "fence" the whitespace gobbler
so it doesn't gobble nearby whitespace which is intended
as payload. You can do this today as <\ 0 4 0>, and I
would prefer to add a more memorable optional <\ s>.
To protect a tab, today's <\ t> works just fine.
I think either <\ 0 4 0> or <\ s> is adequate to "fence
the gobbler", in either direction.
>
> A) \<LineTerminator> or \<WhiteSpace> Just consume the (single)
> line terminator/white space.
>
> Sample,
>
> String tsl = "Lorem ipsum dolor sit amet, consectetur \
> adipiscing elit. Nunc est libero, vehicula \
> nec molestie in, semper aliquam magna.";
>
> String msl = """
> Lorem ipsum dolor sit amet, consectetur \
> adipiscing elit. Nunc est libero, vehicula \
> nec molestie in, semper aliquam magna.""";
>
> This works if the line terminator follows immediately after the \ . (+1)
>
> Can not tell if it is a white space or line terminator after the \ . (-1)
>
> This does not work if there is one or more intervening white space
> characters. (-1)
>
> This works for multi-line string literals because of stripTrailing. (+1)
>
> This does not work for traditional string literals because there is no
> notion of auto alignment to strip the leading white space on the next
> line. (-2)
-1 from me. It lets you break the long line, but then you have
to place it flush against the left margin. To me breaking a long
line inherently has two decisions: 1. break the line, 2. decide
where to place the second part on the next line, using spaces
and tabs. So I want the same mechanism that gobbles the LT
to also gobble the succeeding whitespace. Thus <\ LT WS*>
expands to the null string.
>
> B) \<WhiteSpace> Consume all white space up to and including the line
> terminator.
>
> Same sample as A).
>
> Works in more cases than A). (+2)
>
> Still does not work for traditional string literals because there is no
> notion of auto alignment to strip the leading white space on the next
> line. (-2)
Same objection (and proposal) as for A.
>
> C) \<WhiteSpace> Consume all white space (including LF and CR) up to a
> non-white space or end of string.
>
> Same sample as A).
>
> This works for both traditional and multi-line strings. (+1)
>
> Note that in A), B) and C) the next line may influence multi-line
> indentation. I.E., escapes are translated after auto alignment. (?1)
+1 This is the one I like! I accept that, for a fat string with
rectangle extratction, I am required to indent the second
line fragment *after* the left margin of the extracted rectangle.
It's a fine compromise.
String msl = """
First.
Lorem ipsum dolor sit amet, consectetur \
adipiscing elit. Nunc est libero, vehicula \
nec molestie in, semper aliquam magna.
Last.
""";
=> "First.\n Lorem…magna.\nLast."
In this example, the continuation lines (second and third
after Lorem…) can be exdented to align with First and
Last, but not further. Any extra indentation, after that
of First and Last, is gobbled by <\ LT WS*>.
> D) \, (something other that white space) but otherwise the same as C)
>
> String tsl = "Lorem ipsum dolor sit amet, consectetur \,
> adipiscing elit. Nunc est libero, vehicula \,
> nec molestie in, semper aliquam magna.";
>
> String msl = """
> Lorem ipsum dolor sit amet, consectetur \,
> adipiscing elit. Nunc est libero, vehicula \,
> nec molestie in, semper aliquam magna.""";
>
> Works but trading " + for \, . (?1)
-1 (Not sure what D buys…)
>
> E) \> (something other that white space)
> Consume all white space up to and including the line terminator.
> \< (something other that white space)
> Consume all white space back to beginning of line.
>
> String tsl = "Lorem ipsum dolor sit amet, consectetur \>
> \<adipiscing elit. Nunc est libero, vehicula \>
> \<nec molestie in, semper aliquam magna.";
>
> String msl = """
> Lorem ipsum dolor sit amet, consectetur \>
> \<adipiscing elit. Nunc est libero, vehicula \>
> \<nec molestie in, semper aliquam magna.""";
>
> A goal of the multi-line JEP was to make the string more readable, less
> error prone and maintainable. (-10)
Yep.
> Note for D) and E), is it an error if a non-white space is encountered
> or just stop? (?1)
D/K.
— John
More information about the amber-spec-experts
mailing list