String literals: some principles

Guy Steele guy.steele at oracle.com
Fri May 3 20:37:36 UTC 2019


I completely agree with what you said here, John.  We both took a good look, but you squinted with your right eye, and I with my left. :-)  Either point of view is correct; the two together yield depth perception.  Yay!

> On May 3, 2019, at 4:21 PM, John Rose <john.r.rose at oracle.com> wrote:
> 
> On Apr 29, 2019, at 8:48 AM, Guy Steele <guy.steele at oracle.com> wrote:
>> 
>>> On Apr 28, 2019, at 4:32 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>>> 
>>> . . .
>>> Looking ahead to the next round, we can build on this.  In the first round, we mistakenly thought that there was something that could reasonably be called a “raw” string, but this notion is a fantasy; no string literal is so raw that it can’t recognize its closing delimiter.  So “rawness” is really only a matter of degree.  
>> 
>> This is _almost_ true.  If a string is truly raw (that is, it can contain _anything_), then one absolutely cannot depend on recognizing the closing delimiter by examining what might be the raw content.
>> 
>> Put another way: one cannot determine how long the raw content is by examining it.  That’s a solid principle.
> 
> I'm going to be nit-picky here and refer to my earlier
> mentions of the paradigm of strong quoting, which
> at its heart simply means you have an infinite set of
> delimiters to choose from, when wrapping a payload
> into a literal syntax.
> 
> Adding a numeral to the open quote means that there
> are now an unbounded set of open quotes, so it is an
> instance of strong quoting.  Another instance of strong
> quoting adds nonces, and yet another just lengthens
> the quote pattern until it doesn't occur (anywhere) in
> the raw string payload.
> 
> The numeric prefix convention is different from other
> kinds of strong quoting conventions, in that the end-quote
> can be a substring of the payload.  Actually, the end-quote
> is most naturally the empty string, which is a substring
> of every string.
> 
> The numeric prefix convention and other strong-quote
> conventions all share a common property:  The convention
> as a whole is universal for arbitrary payloads, but for
> any given payload there are quotes which work and others
> that don't work.  In the case of the numeric prefix
> convention, once you choose an open-quote (with
> numeral) you are limited to payloads of that length.
> That's not quite a "raw string" any more, since it's
> suitable only for a fixed-sized character field.
> Likewise, once you choose a particular nonce-based
> or patterned quote (e.g., seven double-quotes),
> payloads containing the corresponding end-quote
> as a substring are no longer suitable.
> 
> Once you pick a particular payload string, the next
> question is whether you can embed that particular
> string into your program without inserting escape
> sequences.  Only with a strong quote scheme of
> some sort is this possible.  But, with any of several
> strong quote schemes, it is possible to dispense
> with escapes for any given string; it is not a fantasy.
> 
> — John



More information about the amber-spec-experts mailing list