<i18n dev> RL1.5 Simple Loose Matches
Tom Christiansen
tchrist at perl.com
Sun Jan 23 12:00:13 PST 2011
Java meets this requirement:
RL1.5 Simple Loose Matches
To meet this requirement, if an implementation provides for
case-insensitive matching, then it shall provide at least the
simple, default Unicode case-insensitive matching.
To meet this requirement, if an implementation provides for
case conversions, then it shall provide at least the simple,
default Unicode case conversion.
To effect the required behaviour in Java, one must compile the
pattern with not only the Pattern.CASE_INSENSITIVE flag but also
with the Pattern.UNICODE_CASE flag.
As previously mentioned in the RL1.1 discussion, there are
standing bugs associated with the Pattern.CANON_EQ flag and
symbolic literals, and these problems also apply here when all
those flags are put together. But I consider that a CANON_EQ
bug moreso than a UNICODE_CASE bug, and canonical equivalence
is an RL2.1 feature, not a Level 1 requirement
It turns out that there is substantial discussion on Unicode mailing
list about just what case-insenstive matching means, as it is not as
well defined as one might think it is. The multi-character folds are
just one such problem, but there are other issues as well.
For now I'll waive Java through on this one, because I believe
that a future revision of the Unicode Standard will someday
clarify just what it is they mean by this.
--tom
More information about the i18n-dev
mailing list