RFR 8230365 : Pattern for a control-char matches non-control characters
Ivan Gerasimov
ivan.gerasimov at oracle.com
Thu Sep 5 20:16:46 UTC 2019
Hello Bernd!
Thank you for your comments!
I'm going to proceed with only the restriction part of the change for
now, so no blind conversion of lower-case control chars will happen.
A system property will allow the users to return to the previous less
restrictive behavior, should they decide to keep malformed patterns
unchanged.
I'll post the updated webrev and CSR request shortly.
With kind regards,
Ivan
On 9/4/19 10:54 PM, Bernd Eckenfels wrote:
> Hallo,
>
> Since not all combinations make sense (Exception+convert) a multi value might be better:
>
> jdk.regex.control=WARN|EXCEPTION|STANDARD|LEGACY
>
> With Exception generating an error, Standard beeing the planned new default (treating upper/lower same and error on all undefined chars) and legacy beeing the manual fallback to current behavior and WARN the same fallback but with logging.
>
> I guess some form of early feedback like EXCPETION or WARN is needed, even when it is between a rock and a hard place. Maybe have at least one iteration where it defaults to LEGACY (+Release Notes announcement), then WARN and then finally STANDARD?
>
> Gruss
> Bernd
>
>
> --
> http://bernd.eckenfels.net
>
> ________________________________
> Von: core-libs-dev <core-libs-dev-bounces at openjdk.java.net> im Auftrag von Ivan Gerasimov <ivan.gerasimov at oracle.com>
> Gesendet: Donnerstag, September 5, 2019 4:00 AM
> An: Martin Buchholz; Stuart Marks
> Cc: core-libs-dev
> Betreff: Re: RFR 8230365 : Pattern for a control-char matches non-control characters
>
> Thank you Martin!
>
> On 8/30/19 6:19 PM, Martin Buchholz wrote:
>> There's a strong expectation that ctrl-A and ctrl-a both map to
>> \u0001, so I support Ivan's initiative.
>> I'm surprised java regex gets this wrong.
>> Might need a transitional system property.
>>
> Right. I think it would be best to introduce two system properties:
>
> The first, to turn on/off the restrictions on the control-char names.
> This will be enabled by default, and will permit names from the limited
> list: capital letters and a few other special characters.
>
> The second one, to enable mapping of lower-case control-char names to
> their upper-case counterpart. This option should be turned off by
> default for the current release of JDK, and then turned on by default
> for some subsequent release (when, presumably, most applications that
> use this kind of regexp are fixed).
>
> This all feels like a little bit too much for such a rarely used
> feature, but probably is a proper thing to do anyway :-)
>
> If we have an agreement on these system properties, I can create a
> separate test to verify all possible combinations.
>
>
>> What's the best bit-twiddle? Untested:
>> if ((c ^= 0x40) < 0x20) return c;
>> if ((c ^=0x20) <= 26 && c > 0) return c;
>>
>> 0x40 is more readable than 64.
>>
> `((ch-0x3f)|(0x5f-ch)) >= 0` does the trick for regular (non-lower-case)
> ids.
>
>> Does ctrol-? get mapped to 0x7f ?
>>
> Yes. I've got it in the test at the end of the line 4997.
>
> Would you please help review the updated webrev:
>
> http://cr.openjdk.java.net/~igerasim/8230365/02/webrev/
>
> With kind regards,
>
> Ivan
>
>
>
--
With kind regards,
Ivan Gerasimov
More information about the core-libs-dev
mailing list