RFR 8230365 : Pattern for a control-char matches non-control characters
Ivan Gerasimov
ivan.gerasimov at oracle.com
Thu Sep 5 01:49:41 UTC 2019
Thank you Martin!
On 8/30/19 6:19 PM, Martin Buchholz wrote:
> There's a strong expectation that ctrl-A and ctrl-a both map to
> \u0001, so I support Ivan's initiative.
> I'm surprised java regex gets this wrong.
> Might need a transitional system property.
>
Right. I think it would be best to introduce two system properties:
The first, to turn on/off the restrictions on the control-char names.
This will be enabled by default, and will permit names from the limited
list: capital letters and a few other special characters.
The second one, to enable mapping of lower-case control-char names to
their upper-case counterpart. This option should be turned off by
default for the current release of JDK, and then turned on by default
for some subsequent release (when, presumably, most applications that
use this kind of regexp are fixed).
This all feels like a little bit too much for such a rarely used
feature, but probably is a proper thing to do anyway :-)
If we have an agreement on these system properties, I can create a
separate test to verify all possible combinations.
> What's the best bit-twiddle? Untested:
> if ((c ^= 0x40) < 0x20) return c;
> if ((c ^=0x20) <= 26 && c > 0) return c;
>
> 0x40 is more readable than 64.
>
`((ch-0x3f)|(0x5f-ch)) >= 0` does the trick for regular (non-lower-case)
ids.
> Does ctrol-? get mapped to 0x7f ?
>
Yes. I've got it in the test at the end of the line 4997.
Would you please help review the updated webrev:
http://cr.openjdk.java.net/~igerasim/8230365/02/webrev/
With kind regards,
Ivan
More information about the core-libs-dev
mailing list