Raw string literals and Unicode escapes
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Mon Feb 26 22:57:37 UTC 2018
Of course - delimiters is not part of the string length - I see now why
you can have (in theory) unbound prefix/suffix.
Personally, I find the argument - "because you can have unlimited-length
identifiers" not a great fit. From a lexer writer perspective, I can see
that it is used as a candidate - after all it is a token whose size is
unbound. But I find it hard to ignore that the roles played by
identifiers and delimiters in the grammar are quite different.
At least there were other cases were we found different trade off
between expressiveness and practicality - see Project Coin's use of
repeated underscores in binary literals (subsequently banned):
private static final int BOND =
0000_____________0000________0000000000000000__000000000000000000+
00000000_________00000000______000000000000000__0000000000000000000+
000____000_______000____000_____000_______0000__00______0+
000______000_____000______000_____________0000___00______0+
0000______0000___0000______0000___________0000_____0_____0+
0000______0000___0000______0000__________0000___________0+
0000______0000___0000______0000_________0000__0000000000+
0000______0000___0000______0000________0000+
000______000_____000______000________0000+
000____000_______000____000_______00000+
00000000_________00000000_______0000000+
0000_____________0000________000000007;
(Example courtesy of Joshua Bloch)
Maurizio
On 26/02/18 21:54, Jim Laskey wrote:
> Why introduce an artificial limit? Identifiers don’t have a
> limit. 3.8. Identifiers An identifier is an *unlimited-length
> sequence* of Java letters and Java digits, the first of which must be
> a Java letter.
>
> — Jim
>
>> On Feb 26, 2018, at 5:29 PM, Maurizio Cimadamore
>> <maurizio.cimadamore at oracle.com
>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>
>>
>>
>> On 26/02/18 20:17, John Rose wrote:
>>> Any*finite choice* of end-quotes has the same problem, with
>>> a non-zero probability that decreases (but does not vanish)
>>> with the number of available end-quotes. The only way to
>>> break out of the box is to allow the user an unlimited range
>>> of successively "stronger" end-quotes (i.e., less likely ones).
>> In reality there is a 'finite' upper bound for this length, which is
>> given by 2^16 /2 = 2 ^ 15. That's the maximum delimiter size you
>> could encode in a Java String which you can also symmetrically close
>> - and it's an edge case, as it will contain the empty string.
>>
>> So, yes, on paper, I agree with the argument, in practice, I guess
>> I'd me more in favor of limiting the number of repetitions - I
>> wouldn't like to open the door to puzzlers:
>>
>> `````````````````````````````````````````````````````````````````````````hello`````````````````````````````````````````````````````````````````````````
>>
>> (which might leave some Ascii art lovers a bit unhappy :-))
>>
>> I think limiting to 8 or some other reasonable small number will
>> probably reduce the clash probability enough? And, even if it's not
>> enough, I guess we'd still be left with the question if a long
>> (possibly unbounded?) escaping sequence is something we'd like to see
>> in Java.
>>
>> Maurizio
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20180226/7c179260/attachment.html>
More information about the amber-spec-experts
mailing list