String reboot (plain text)

Fri Mar 15 19:42:00 UTC 2019

On Mar 13, 2019, at 11:56 AM, Kevin Bourrillion <kevinb at google.com> wrote:
> 
>    - Multi-line-ness and raw-ness are orthogonal concepts.
> 
> Is that true, as stated? I would have said that any support for rawness automatically gives you support for multi-line-ness by nature, because a newline character becomes literal. That doesn't seem like orthogonality.

True orthogonality means that the two vectors have a cosine of zero.
That's a stronger condition than independence, which is a cosine of
less than one.

There are at least two interesting factors that pull the cosine between
"raw" and "multiline" away from zero.  One is that raw implies multiline,
as you just pointed out, Kevin.  A second goes the other way:  Multiline
asks for raw, because of its scaling properties.  Classic escapes are *less
appropriate* to multiline strings than classic single-line strings.  This
point is touched on here:

>>  - For multi-line strings, a stronger delimiter (e.g., """) seems to be preferred on readability grounds, because people don't want to have to squint to see where the embedded code ends and the Java code resumes.  
>> 
> Valid point. Today, every line or group of lines in a .java source file is Java code, but now there will be sections where that's not at all clearly the case. Making the boundaries clear between the two types of code seems like a good practice. The old proposal allowed a single backtick to offset these sections in 99% of cases, but it occurred to me that developers would often be better off using more of them just to delineate better…

But I think the point is a little stronger.  We can expect that normal code has
visually limited line lengths, but visually unlimited line counts.  Even if we
believe that well-behaved multi-line strings will fit in a single screenful,
it is the case that the scale of a single-line string is the scale of a single
screen line, while the scale of a multi-line string is a *whole screen*.
It is a *questionable assumption* that escape sequence notations will
work just as well at the larger scale as the known-good smaller scale.
And we question that assumption when we speak of "squinting" as above.
Let's be clear about this:  Squinting through a page of code for escapes
is at least N times harder than squinting through a line of code, where
N is the page size.

Raw strings given a clear and plausible answer to this problem posed by
multi-line strings, hence my conclusion that they are (for this reason among
others) not fully orthogonal features.  The answer is, "we won't put any
escape sequences into the bulk, we will only put them at the boundary".
Boundaries are *always* (barring fractals) smaller than bulks.  Another
part of the answer, which has been derived again in a previous message,
is "we'll put a big-enough escape sequence at the boundary so you'll
have a fighting chance to see it in the bulk".  I think that's the real reason
why, after inspecting single-" as a multi-line delimiter, we always discard
it in favor of something more distinctive, with multiple characters.

The clever discoveries of payloads which introduce the short closing
quote are interesting puzzles, but they are just special cases of the
general rule that, if you are going to spray a large bulk of string payload
on the screen, you are going to need a larger unit of visual information
to make a clearly evident ending fence for it.  That more general rule
does not appeal to dubious assertions like "this will only be for SQL
and five more notations, we promise".  Especially if we (later?) allow
the ending fence to grow as large and robust as each use case requires.
(That's my argument for "strong quotes" in all sizes, of course.)

I guess where this ends for me is that, not buying the orthogonality
argument, I more easily see raw as a better first course, because it picks
up most of the multi-line use cases, and also the case of single-line
regular expressions.

— John