RFR 8230365 : Pattern for a control-char matches non-control characters
Ivan Gerasimov
ivan.gerasimov at oracle.com
Thu Sep 5 20:29:16 UTC 2019
Thank you Martin again!
Here's the updated webrev without the lower-case control char ids:
http://cr.openjdk.java.net/~igerasim/8230365/03/webrev/
I've also filed a CSR to record the changes in bahavior:
https://bugs.openjdk.java.net/browse/JDK-8230675
Could you please help review it?
On 9/4/19 9:00 PM, Martin Buchholz wrote:
> Thanks, Ivan. We're mostly in agreement.
>
> + * If {@code true} then lower-case control-character ids are mapped to the
> + * their upper-case counterparts.
> Extra "the".
>
> After all these decades I only now realize that c ^= 0x40 moves '?' to
> the end of the ASCII range and all the other controls to the start!
>
> Should we support lower-case controls? Compatibility with perl regex
> still matters, but a lot less than in 2003. But the key is that we
> got the WRONG ANSWER previously, so when we restrict the control ids
> let's just make lower case controls syntax errors. Silently changing
> behavior is bad for users. ... so let's abandon
> ALLOW_LOWERCASE_CONTROL_CHAR_IDS.
> An alternative:
> int ch = read() ^ 0x40;
> if (!RESTRICTED_CONTROL_CHAR_IDS || ch < 0x20 || ch == 0x7f) return ch;
>
>
This code will probably be most efficient for the common case.
However, I'd prefer to use the auxiliary method isCntrlId() in this
case, as it is self-documenting and still efficient enough.
With kind regards,
Ivan
More information about the core-libs-dev
mailing list