<i18n dev> RL1.5 Simple Loose Matches

Tom Christiansen tchrist at perl.com
Sun Jan 23 12:00:13 PST 2011


Java meets this requirement:

    RL1.5	Simple Loose Matches

    To meet this requirement, if an implementation provides for
    case-insensitive matching, then it shall provide at least the
    simple, default Unicode case-insensitive matching.

    To meet this requirement, if an implementation provides for
    case conversions, then it shall provide at least the simple,
    default Unicode case conversion.

To effect the required behaviour in Java, one must compile the
pattern with not only the Pattern.CASE_INSENSITIVE flag but also
with the Pattern.UNICODE_CASE flag.

As previously mentioned in the RL1.1 discussion, there are
standing bugs associated with the Pattern.CANON_EQ flag and
symbolic literals, and these problems also apply here when all
those flags are put together.  But I consider that a CANON_EQ 
bug moreso than a UNICODE_CASE bug, and canonical equivalence 
is an RL2.1 feature, not a Level 1 requirement

It turns out that there is substantial discussion on Unicode mailing 
list about just what case-insenstive matching means, as it is not as 
well defined as one might think it is.  The multi-character folds are 
just one such problem, but there are other issues as well.

For now I'll waive Java through on this one, because I believe 
that a future revision of the Unicode Standard will someday
clarify just what it is they mean by this.

--tom


More information about the i18n-dev mailing list