RFR 8230365 : Pattern for a control-char matches non-control characters

Ivan Gerasimov ivan.gerasimov at oracle.com
Thu Sep 5 20:29:16 UTC 2019


Thank you Martin again!

Here's the updated webrev without the lower-case control char ids:

http://cr.openjdk.java.net/~igerasim/8230365/03/webrev/

I've also filed a CSR to record the changes in bahavior:

https://bugs.openjdk.java.net/browse/JDK-8230675

Could you please help review it?


On 9/4/19 9:00 PM, Martin Buchholz wrote:
> Thanks, Ivan.  We're mostly in agreement.
>
> +     * If {@code true} then lower-case control-character ids are mapped to the
> +     * their upper-case counterparts.
> Extra "the".
>
> After all these decades I only now realize that c ^= 0x40 moves '?' to 
> the end of the ASCII range and all the other controls to the start!
>
> Should we support lower-case controls?  Compatibility with perl regex 
> still matters, but a lot less than in 2003.  But the key is that we 
> got the WRONG ANSWER previously, so when we restrict the control ids 
> let's just make lower case controls syntax errors.  Silently changing 
> behavior is bad for users. ... so let's abandon 
> ALLOW_LOWERCASE_CONTROL_CHAR_IDS.
> An alternative:
> int ch = read() ^ 0x40;
> if (!RESTRICTED_CONTROL_CHAR_IDS || ch < 0x20 || ch == 0x7f) return ch;
>
>

This code will probably be most efficient for the common case.

However, I'd prefer to use the auxiliary method isCntrlId() in this 
case, as it is self-documenting and still efficient enough.

With kind regards,

Ivan




More information about the core-libs-dev mailing list