Raw strings starting/ending with backtick

Cay Horstmann cay.horstmann at sjsu.edu
Mon Nov 26 09:05:35 UTC 2018


Le 26/11/2018 à 08:28, John Rose a écrit :
> On Nov 24, 2018, at 11:39 PM, Cay Horstmann <cay.horstmann at sjsu.edu 
> <mailto:cay.horstmann at sjsu.edu>> wrote:
>>
>> I agree that it is inelegant that there is no good syntax for raw 
>> strings starting with a backtick. Some time ago 
>> (http://horstmann.com/unblog/2018-06-01), I suggested that an initial 
>> newline after the backticks could count as part of the raw string 
>> delimiter:
>>
>>      String myNameInABox = ```
>> +-----+
>> | Cay |
>> +-----+```; // This string starts with +
>>
>> Ok, maybe it's not brilliant, but it solves two problems: (1) how to 
>> format multiline strings that should be aligned, without having to 
>> strip out the initial newline (2) how to declare strings that start 
>> with a backtick.
> 
> The basic reality here is that we are trying to keep the quotes as 
> simple as possible,
> while allowing them to quote anything at all, including their shorter 
> siblings.
> A close-quote can't both appear in a string and end that string for 
> obvious reasons.
> 
> Result:  There must be an infinite set of close-quotes available, so 
> that even if a
> string has the first N-1 close-quotes inside it, it can be terminated 
> with the Nth one.
> This also implies there must be a corresponding infinite set of open-quotes.
> (Opens and closes can be pairwise identical, as in the proposed feature.)
> 
> Next, we have the problem of designing a set of open quotes which can
> be differentiated from each other before the string body proper is scanned.
> (You have to determine the close-quote before scanning the string body.)
> 
> This really means that open-quotes must be self-delimiting, or else that
> there are some substrings that are forbidden to follow at least some
> open-quotes.  If an open-quote syntax is not fully self-delimiting, there
> are two open-quotes Q, QR for which Q is a proper prefix of QR.  In this
> case, a quoted string body cannot begin with R and be quoted with Q.
> 
> In the present case, we allow the open-quotes to be composed of an
> alphabet of only one letter, the backtick, but allow any positive number
> of them.  That's pretty good (and really, really simple) but it does have
> the observed defect, that none of the open-quotes are self-delimiting,
> because for any N>0, "`".times(N) is a proper prefix of the next open-quote,
> "`".times(N+1).  Thus, for no open-quote (in the present scheme) can
> the string body begin with backtick.
> 
> (There is a mirror-image problem with the end-quotes, if they are
> not self-delimiting.  It must be possible, for any given string, to
> choose an end-quote which (a) isn't in the string, and (b) when
> appended to the string does not create an earlier instance of itself.
> Again, having an alphabet of one character Q for the end-quotes
> means that the string cannot end in Q.)
> 
> Can an infinite set of strings which are repetitions of a single character
> be made self-delimiting?  Never, since any given member of that set
> is the proper prefix of some longer member.
> 
> Making such a set self-delimiting is simple:  Add another character,
> and allow it to be a terminating character for the open-quote.
> Or, allow the open-quote to include an optional numeric count
> that determines the length of the rest of the quote.  Or, allow
> the open-quote to have arbitrary (quoted) substructure.
> 
> (And for each open-quote define a corresponding close-quote.
> Then given a string, choose the shortest close-quote that does not
> occur in the string, and which when appended to the string will
> not create an earlier instance of itself.  Begin the quoted string
> what that close-quote's open-quote.)
> 
> Supposing that Q is the main quote character and R is a helper
> (or two or more) which helps size the end-quote.  Examples of
> these three approaches would be:
> 
>    OQ1 = { Q.times(N) + R | N > 0 }
>    OQ2 = { R + String.valueOf(N) + Q.times(N) | N > 0 }
>    OQ3 = { Q + S + R | S in (Universe - R).star() }
> 
> Such schemes are more powerful, but much harder to describe than
> what we have now:
> 
>    OQ0 = { Q.times(N) | N > 0, Q = \" }
> 
> Coming up with these schemes is simple.  Coming up with a scheme
> that feels simple to use seems to be impossible.  Tuning and tweaking
> these schemes is *NOT* a hill-climbing activity that ascends to better
> and better solutions.  Creating self-delimiting string syntaxes is a
> frustrating exercise in pushing the complexities and corner cases
> into darker and darker holes.
> 
> We settled on OQ0 (alphabet of one character) because it is simple
> and easy to understand.  We looked carefully at other OQ schemes
> and did not find that their specification and learning complexity
> was paid for by removing the practical complexity of encoding a
> few odd-looking strings.  OQ1, etc., have their own sharp edges
> which we think users will run into more often than they will run
> into the sharp edges of OQ0.  Trying to "fix" OQ0 just makes it
> messier, like rubbing your finger over that single speck of lint on the
> lens of your new binoculars.
> 
> — John

Hi John,

the scheme that I suggest would, in your notation be an OQ1 scheme, with 
Q = '\'' and R = '\n', quoting any string s as n * Q + R + s + n * Q, 
with sufficiently large n. Do you recall what sharp edges one would run 
into? It's not intuitive to me and it would be good to learn from your 
experience.

Cheers,

Cay
-- 

Cay S. Horstmann | http://horstmann.com | mailto:cay at horstmann.com


More information about the compiler-dev mailing list