Raw String Literals (RSL) - indent stripping of multi-line strings

John Rose john.r.rose at oracle.com
Mon Apr 23 19:20:02 UTC 2018


My $0.02.

On Apr 23, 2018, at 10:04 AM, Jim Laskey <james.laskey at oracle.com> wrote:
> 
> QUESTIONS:
> 
> - Should the javac compiler remove incidental indentation at compile time?

If so, then there has to be a way to set the incidental indentation
to zero (presumably by adding a stripped empty determining line).
In this way "all-indented" payloads, whose lines all begin with spaces
and/or tabs, can be expressed.

Otherwise, it's a RSL capability which cannot express some RSs
(specifically, those whose lines all begin with a space).  That would
be kind of smelly; it would at least require a String.prefixLines
function to put back in the spaces.

If you buy the above point, then that suggests answers to the
the other questions.

> 
> - What is the rule set used?
> 
>    - Should the last line be a determining line?

To support all-indented payloads it seems necessary to allow
a blank determining line to be stripped from the payload.
In the design space under discussion, that's the last line.
So, yes.

Also, your examples c and d, which are opposed on this
issue, make it pretty clear that c is preferable, since it gives
more stylistic control to the coder.  And stylistic control is
exactly what we are talking about, when we are discussing
the stripping of elective indents and line beaks.

>    - Should trailing whitespace be stripped?

As with the "all-indented" case above, trailing space should be
stripped only if there is a way to opt out of stripping.  I think the
trimMarkers API is the way to cover this use case, since it is
rather specialized.

>    - Should the first or last line be removed if blank?

Yes.  In essence, the syntax of a quote sequence includes
a line terminator.  This BTW allows non-periodic quote sequences,
which as a corollary allows leading and trailing quote sequences
to be encoded in the RSL:

var hasLeadingAndTrailingTick = ``
   `I went for a walk in the tall brush and picked up some riders.`
   ``;

Also, the removal of leading and trailing blank lines gives users
some degrees of stylistic freedom that seem to be customary,
along with the indent-stripping.

Here's a new point along these lines, if I may be so bold:

If we are sticking in non-payload stylistic inputs into RSLs,
we should consider opening up a reservation for future use,
in the form of RSL configurations which are declared to be
illegal.  We could declare that some obviously pathological
subset of near-misses to an indent-stripped RSLs is illegal,
and reserved for future extension.

On the other hand, we are trying very hard to accept every
RSL the user could randomly type in, which is incompatible
with reserving a set of constructs for future use.  This isn't
logically necessary in the style-control use cases; we
can simply declare that some style-control is just illegal,
if we think there's a chance of using that coding space
in the future.

By obviously pathological I mean something like one or
all of these:

   String x = `_
..line one
..line two
..`;

   String y = `
..___line one
..line fifty-two
..___line ninety-nine
..`;

   String z = `
..line one
..line two
..___`;

(Here underbar _ is a non-stripped space.)

In case x, the is a whitespace on the non-determining blank first line.
Surprisingly, this space doesn't get stripped (under the proposed rules).

In case y, line 52 determines the indent to strip, and this is true even
if it is buried in the middle of 100 lines.  Luckily, in this case, the determining
last line (just before the close-quote) ratifies this choice, so there is a
unique place to look for the stripped indent, without searching the whole string.

In case z, the stripped last line, while a determining line, has extra
whitespace.  This is easy to miss.

I suggest placing a structural constraint on stripped indents, that the
last line, if blank, is stripped, and if stripped, must be of length exactly
zero after the leading indent is trimmed.  That would rule out z and
ameliorate y.

I also suggest ruling out x by requiring that the first line, which is
non-determining, must not have leading whitespace at all.
This doesn't break any of your examples a-f.

Removing cases x and z might remove a class of puzzler about the
significance of leading white spaces near the ends of RSLs.
(Can anyone see a positive use case for them that can't be easily
adjusted to a less pathological form?)

And (getting back to extensions) ruling out x also gives us a tidy
little subspace of RSLs to reserve for future use.  In other words,
an RSL with multiple lines whose leading line begins with a space
can be defined, in future iterations of this feature, to include
envelope information about the RLS, after that space.
Something like this:

   String q = `_{cool RSL header invented by our successors}
..line one
..line two
..`;

This envelope information would *not* be included in the payload,
but would be stripped as if the leading line were purely blank.
It would somehow control the processing of the RSL payload,
and/or the parsing of the rest of the RSL.

So in this future feature, the first line would still not be a determining
line, and would be stripped completely, and the stuff between
braces would be used in some way we can't define at present.

I suppose it could have to do with processing embedded escapes.

   String r = `_{cool RSL header invented by our successors}
..line one
..line two {cool embedded stuff enabled by RSL header}
..`;

But there's no way to say at this moment what such a future syntax
would look like, and that's my point:  For now we can reserve a corner
of the RSL encoding space for futures.

We might never exercise the option, but it seems wise to buy the
option, if it can be bought cheaply as a side effect of restrictions
on pathological indent management.

I didn't raise this earlier though it was on my mind, but as you see
the complexity trade-offs change with built-in indent stripping.
And, obviously, there are other ways to extend RSLs in the future
which may seem better, such as by adding prefixes before the string
quote.  If we don't put constraints on cases x and z above, we still
have options for future extension.

Conversely, even if we are sure we want to make other choices
regarding futures, I think it is a safe move to exclude x and z above.

— John


More information about the amber-dev mailing list