<i18n dev> java.lang.Character lacuna #1 of 2

Xueming Shen xueming.shen at oracle.com
Fri Apr 15 00:14:54 PDT 2011


  Tom

I have filed CR/RFE 7036910: 
j.l.Character.toLowerCaseCharArray/toTitleCaseCharArray for this request.

The j.l.Character.toLowerCase/toUpperCase() suggests to use 
String.toLower/UpperCase() for case mapping,
if you want 1:M mapping taken care. And if you trust the API:-), which 
you should in this case, you will find
that String.toLowerCase/toUpperCase() do handle 1:M correctly. Yes, we 
don't have a toLowerCaseCharArray()
in j.l.c, however, as you noticed that there is ONLY one 1:M case 
mapping for toLowerCase, at least for now,
and our String.toLowerCase() implementation "hardcodeds" that u+0130 as 
the special case. That said, I
yet to dig out the history of toUpperCaseCharArray... and I agree, from 
API design point of view, it would be
more nature to have the pair.

Yes, we do have a RFE 6423415: (str) Add String.toTitleCase()

But given the nature of "title case", the String#toTitleCase() might not 
be what you would like it to be. It
would be strange if String#toTitleCase() does the similar thing the 
String.toLower/UpperCase() do, in which
it title-case-maps all characters inside the String, most people 
probably would expect it only title-case-map
the first character of the "title string". RFE 6423415 has very low 
priority for now.

It might be more reasonable to have j.l.Character.toTitleCaseCharArray() 
instead of j.l.String.toTitleCase().

-Sherman


On 4/14/2011 7:49 PM, Tom Christiansen wrote:
> Sherman,
>
> While I was fixing your docs for j.l.Character, I kept the Unicode
> 6.0 specs close at hand to make sure everything was up to date.  That's
> how I was able to discover that one could safely update this comment
> that noted that 1:M uppercasings happen only in the BMP:
>
>       -        // As of Unicode 4.0, 1:M uppercasings only happen in the BMP.
>       +        // As of Unicode 6.0, 1:M uppercasings only happen in the BMP.
>
> I was very careful not to touch any code whatsoever--much though I
> wanted to. :)
>
> You see, you've got an obvious bug in that you have only a
> toUpperCaseCharArray method to handle the full case mappings from
> Unicode SpecialCasing.txt file.  Clearly absent are corresponding
> methods for the other two cases, lower and title:
>
>      toLowerCaseCharArray
>      toTitleCaseCharArray
>
> This is a serious problem, because it means you grant access to full
> Unicode casing for only one the three mappings.  And it is not as though
> there is anything in String that will take care of this, either!  I was
> shocked that there is no String#toTitleCase method.  And I'm mistrustful
> of the String#toLowerCase method, since there is no toLowerCaseCharArray
> method in j.l.Character for it to access.  So what does it do about
> lowercasing this code point:
>
>       İ   U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE
>
> As you know, the lowercase for that code point is the two-
> character string, "i\x{307}" (that is, "i\N{COMBINING DOT
> ABOVE}"), and this is true no matter the locale; see
> SpecialCasing.txt.
>
> Here are the respective number of code points in Unicode that
> have multichar mappings, the thing that is called "full" case
> mapping in Unicode:
>
>        1  lowercase
>       42  titlecase
>      102  uppercase
>
> It's not really the *number* of code points affected that is the trouble.
>
> Rather, it is Java's complete inability to access them.  It's like having a
> "small" race condition.  It's a hole in the spec.  There really is no
> reason to support full casing mapping only for uppercase but refuse it for
> the other two casings.  This is a very non-parallel situation, and I do not
> understand why it exists; it should not.
>
> Sherman, could you please file this bug report so that it gets to the right
> queue, and then tell me its bug number?  Maybe this is something that could
> be fixed in a future JDK8 project sometime.
>
> --tom



More information about the i18n-dev mailing list