RFR 8214245 : Case insensitive matching doesn't work correctly for some character classes
Ivan Gerasimov
ivan.gerasimov at oracle.com
Mon Apr 22 02:50:19 UTC 2019
Hello!
It turns out, that the case-insensitive j.u.regex.Pattern still pays
attention to the characters case when certain char classes are used.
For example \p{IsLowerCase}, \p{IsUpperCase} and \p{IsTitleCase}
continue to recognize only lower, upper and title case characters, even
in case-insensitive context.
For example, for POSIX char classes this behavior contradicts this
paragraph:
"""
9.2 Regular Expression General Requirements
...
When a standard utility or function that uses regular expressions
specifies that pattern matching shall be performed without regard to the
case (uppercase or lowercase) of either data or patterns, then when each
character in the string is matched against the pattern, not only the
character, but also its case counterpart (if any), shall be matched.
This definition of case-insensitive processing is intended to allow
matching of multi-character collating elements as well as characters, as
each character in the string is matched using both its cases.
...
"""
I also checked how Perl is dealing with in such situation, and yes, it
ignores the case with various \p{} classes when they are used in
case-insensitive context, so all these tests run fine:
'A' =~ /\p{Lower}/i or die;
'a' =~ /\p{Upper}/i or die;
'A' =~ /\p{gc=Lt}/i or die; # title case
'a' =~ /\p{IsTitlecase}/i or die;
'Lj' =~ /\p{Lower}/i or die; # title-cased digraph
'lj' =~ /\p{Upper}/i or die;
'LJ' =~ /\p{Lt}/i or die;
For reference, here's a lengthy document, describing precise rules used
by Perl to deal with \p{} char classes:
https://perldoc.perl.org/perluniprops.html#Properties-accessible-through-%5cp%7b%7d-and-%5cP%7b%7d
So, for any Lower, Upper or Title case chars in case-insensitive context
Perl uses set of "Cased Letters", with is just a combination of these
three categories (aka "LC" general category).
Would you please help review the patch?
BUGURL: https://bugs.openjdk.java.net/browse/JDK-8214245
WEBREV: http://cr.openjdk.java.net/~igerasim/8214245/00/webrev/
--
With kind regards,
Ivan Gerasimov
More information about the core-libs-dev
mailing list