To align, or not to align?

John Rose john.r.rose at oracle.com
Thu Apr 18 19:31:13 UTC 2019


On Apr 18, 2019, at 11:32 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
> So I think the question really comes down to: what _is_ a multi-line string literal.  

As an aficionado of philosophy, I'll take a stab at this.  You can judge
whether it's useful or not.

A string literal is a *convenient* and *natural* programmatic notation
for a constant string payload.  A multi-line string literal conveniently
and naturally is a notation for a constant string payload of multiple lines.

*Convenient* means easy to read, maintain (re-write), and originate (write),
with IDE support or without.

*Natural* means most or all of the payload shows up visually in the
program source, as if it had simply been pasted there.  Base64 would
be highly unnatural because it obscures all payload characters.
Also unnatural (but limited in impact) are \n and + notations.
Extra contextual indents are slightly unnatural but tolerable
because they are easy to disregard using a rectangle rule.

(By the way, a *notation* is a way to visually encode programmer
intentions.  It must be unambiguous.  Overly terse or overly
convenient notation designs sometimes smuggle in ambiguities,
so unambiguity can't always be taken for granted.)

For multi-line payloads, a *natural* notation will tend to put each
line of the payload on its own line in the source code of the literal.

And a *convenient* notation will make clear all distinctions between
payload and enclosing program structure, including any extra
indentation (imposed by enclosing context) on the payload lines.

"Clear distinction" cuts two ways:  We need enough delimiters
to visually separate the payload from context, but if we have
too many delimiters the notation becomes hard to read and
makes the payload look confusing (less natural).

A rectangle rule could be part of a sweet spot in the design
space, since it naturally respect both the 2D format of the
program and the 2D format of the payload.

In this framing of the problem, we could turn the design
knob towards more visually explicit 2D framing of the
payload, by somehow adding a delimiter or escape which
marks the boundary between the enclosing indentation
and the payload indentation.  For example (this is just
an example) the white space at the *very beginning*
of a literal could accept an extra escape of some sort
which signals the transition between the enclosing
source and the payload.  Such extra syntax would be
noisy and harder to write, but it would (as extra syntax
tends to do) would reduce ambiguity about the
programmer's wishes.

Treading on the very edge of syntax design, but refusing
to jump all the way in, I'll suggest that the northwest corner
of the rectangle could be marked with a "blob" of explicit
syntax:

   String mls = """
         __NWC_BLOB__  xx  xx
                                   yy  yy
         """;
   assert mls == "  xx  xx\n   y yy\n";

Or, a left margin blob could mark the whole western edge:

   assert mls == """
         __WWE_BLOB__  xx  xx
         __WWE_BLOB__   y  yy
         """;

I think it's hard to make these be more convenient (readable,
editable) than Jim and Brian's rules for stripping.  They definitely
have an "opt-in" feel to them, because of their extra overheads.

But maybe allowing a single whitespace character to be escaped
would somehow assist the user in distinguishing payload from
non-payload.  I can think of several different ways to
formulate such a rule, but that's going down into syntax
again.

HTH

— John


More information about the amber-spec-experts mailing list