[raw-strings] Indentation problem

Guy Steele guy.steele at oracle.com
Mon Feb 5 18:39:04 UTC 2018


> On Feb 5, 2018, at 1:39 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
> 
>> However, I also note that the broad problem may two or three distinct symptoms, and:
>> (1) A solution that addresses one symptom may not address the others, and
>> (2) On the other hand, it may (or may not) be perfectly reason to address the most painful symptoms in different ways, rather than insisting that a single solution cover them all.
> 
> Indeed so.  This is one reason why we resisted the call to do string interpolation (which many developers conflate with multi-line strings, as many languages with one also have the other) at the same time.  Another way to ask this question is: are we yet sufficiently minimal?  We boiled it down quite a lot already, but are we at "minimal" yet?  Or, did we take a wrong turn in boiling it down, and find ourselves only a local minimum?
> 
>> In particular, I happen to think that the problem of distinguishing snippet indentation from encoding-program indentation may require a rather different kind of solution from the problem of escape characters in embedded snippets.  The reason is that in both these cases the painful symptom is visual in nature rather than logical.  That’s why I can understand what drove Tagir to pursue the pipe-character approach (even though I think it may not be the best solution to the problem).  We may want to use ```…``` to enclose regexes but also want to use some other approach to solve the multi-line / indentation problems.
> 
> OK, so what you're saying here is that it might be a clever self-deception to count newline handling as "just another aspect of raw-ness"?

Bingo.

Back in the day (I’m talking 1960s) it was ugly and wasteful but predictable: if there were line breaks at all (as opposed to record-oriented I/O), they were represented by two characters, CR and then LF, held over from the mechanical abilities/requirements of Teletype machines.

Then in mid-1960s an ISO standard allowed plain LF (eventually semi-renamed Newline) as an alternative, and Multics and then Unix spread this idea (and eventually to Apple).

But another branch of the world, notably the CP/M to MS-DOS to Windows line, continued to use CR/LF.  Worse yet, some software came to use CR along (perhaps a natural enough theory when you consider that the “Return” key on keyboards usually generates the CR character rather than the LF character).

It is simply impossible to be compatible with everyone on this issue, and we are fooling ourselves if we think that raw string representations can solve this problem in all contexts.  Much better, I think, in the absence of consensus to have explicit software gatekeepers at the points where data transitions among these disparate worlds.



More information about the amber-spec-experts mailing list