<i18n dev> java.lang.Character lacuna #1 of 2

Tom Christiansen tchrist at perl.com
Thu Apr 14 19:49:50 PDT 2011


Sherman,

While I was fixing your docs for j.l.Character, I kept the Unicode
6.0 specs close at hand to make sure everything was up to date.  That's
how I was able to discover that one could safely update this comment
that noted that 1:M uppercasings happen only in the BMP:

     -        // As of Unicode 4.0, 1:M uppercasings only happen in the BMP.
     +        // As of Unicode 6.0, 1:M uppercasings only happen in the BMP.

I was very careful not to touch any code whatsoever--much though I
wanted to. :)  

You see, you've got an obvious bug in that you have only a
toUpperCaseCharArray method to handle the full case mappings from
Unicode SpecialCasing.txt file.  Clearly absent are corresponding
methods for the other two cases, lower and title:

    toLowerCaseCharArray
    toTitleCaseCharArray

This is a serious problem, because it means you grant access to full
Unicode casing for only one the three mappings.  And it is not as though
there is anything in String that will take care of this, either!  I was
shocked that there is no String#toTitleCase method.  And I'm mistrustful 
of the String#toLowerCase method, since there is no toLowerCaseCharArray
method in j.l.Character for it to access.  So what does it do about 
lowercasing this code point:

     İ   U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE

As you know, the lowercase for that code point is the two-
character string, "i\x{307}" (that is, "i\N{COMBINING DOT
ABOVE}"), and this is true no matter the locale; see
SpecialCasing.txt.

Here are the respective number of code points in Unicode that
have multichar mappings, the thing that is called "full" case
mapping in Unicode:

      1  lowercase
     42  titlecase
    102  uppercase

It's not really the *number* of code points affected that is the trouble.

Rather, it is Java's complete inability to access them.  It's like having a
"small" race condition.  It's a hole in the spec.  There really is no
reason to support full casing mapping only for uppercase but refuse it for
the other two casings.  This is a very non-parallel situation, and I do not
understand why it exists; it should not.

Sherman, could you please file this bug report so that it gets to the right
queue, and then tell me its bug number?  Maybe this is something that could
be fixed in a future JDK8 project sometime.

--tom


More information about the i18n-dev mailing list