Wrapping up the first two courses
John Rose
john.r.rose at oracle.com
Fri May 3 23:40:16 UTC 2019
On Apr 26, 2019, at 8:59 AM, Kevin Bourrillion <kevinb at google.com> wrote:
>
> On Fri, Apr 26, 2019 at 8:56 AM Kevin Bourrillion <kevinb at google.com> wrote:
>
> Apparently bash's behavior is to replace <any amount of whitespace, backslash, newline, any amount of whitespace> with a single space character, and that at least seems like a useful behavior for us too if we're open to it.
No, it replaces <\ NL> with nothing at all. Any spaces before
or after that two-character sequence are bystanders.
In a separate step, if not inside quotes, all sequences of
whitespace are treated as if they were single spaces, as
the shell breaks a line up into words. The net is that the
stuff you mentioned behaves like whitespace.
But also:
```
$ x=a\
b
$ echo $x
ab
```
However, I'm proposing that horizontal whitespace *after*
the newline is "gobbled up" and thrown away with the leading
<\ LT>, so the escape sequence is more like <\ LT (SP|TAB)*>.
This gives the programmer more control over program layout.
> I was forgetting, when I said this, that another substantial minority use case (I want to say at least 15%? These were rough estimates though) for multi-line strings is really long URLs, checksums, etc., that aren't meant to have any spaces in them at all. So the bash behavior is not necessarily what we'd want, although of course consistency with it has some amount of value in itself.
The actual bash behavior, described above, *is* what we want.
If the programmer *wants* a space, one can be placed just
before the <\ LT> sequence. Luckily, that's reasonably readable.
> Which raises another question: do we allow \<terminator> in SL strings? (I presume so, and we just eat the \ and the terminator.)
If we eat the (SP|TAB)* after LT, then we have given the programmer
control over indentation, in a way that is consistent with the rectangle
rule, but applies only to the one escaped (partial) line.
> Hmm, I can see how that could be harmless but it seems to blur the boundary between the features to me.
It seems that way. I think what's happening is another iteration
of "Let's do raw strings! Wait, that's not what they really are"
and now we are at "Let's do multi-line strings!"
Brian's comment is that the tri-quote makes a better container
for payloads with single quotes. Those payloads often have
multiple lines too. So it's really "fatter strings", in some sense.
We might say we are making strings with *unescaped LTs*.
The rectangle rule shows up as soon as we realize that programmers
have strong opinions about spacing, and want to indent their
code so it is readable. (Pretty too; beauty is a proxy for
readability I suppose.) So if we let the programmer start
putting paragraphs into string bodies, we also have to
let the programmer manage indentation. And it's a short
and natural step from exdenting to line-breaking, IMO.
We might say we are making *more readable syntax for
large strings*. Minimizing escape sequences makes them
readable, and so does giving the programmer control
over program layout.
Such "readable strings" make some sense for one-liners also,
especially if we extend the 2D rectangle rule to the 1D case
and strip leading and trailing whitespace, near the triquotes.
In the end, we might just dub them "fat strings".
— John
More information about the amber-spec-observers
mailing list