<i18n dev> java.lang.Character lacuna #1 of 2
Xueming Shen
xueming.shen at oracle.com
Fri Apr 15 00:14:54 PDT 2011
Tom
I have filed CR/RFE 7036910:
j.l.Character.toLowerCaseCharArray/toTitleCaseCharArray for this request.
The j.l.Character.toLowerCase/toUpperCase() suggests to use
String.toLower/UpperCase() for case mapping,
if you want 1:M mapping taken care. And if you trust the API:-), which
you should in this case, you will find
that String.toLowerCase/toUpperCase() do handle 1:M correctly. Yes, we
don't have a toLowerCaseCharArray()
in j.l.c, however, as you noticed that there is ONLY one 1:M case
mapping for toLowerCase, at least for now,
and our String.toLowerCase() implementation "hardcodeds" that u+0130 as
the special case. That said, I
yet to dig out the history of toUpperCaseCharArray... and I agree, from
API design point of view, it would be
more nature to have the pair.
Yes, we do have a RFE 6423415: (str) Add String.toTitleCase()
But given the nature of "title case", the String#toTitleCase() might not
be what you would like it to be. It
would be strange if String#toTitleCase() does the similar thing the
String.toLower/UpperCase() do, in which
it title-case-maps all characters inside the String, most people
probably would expect it only title-case-map
the first character of the "title string". RFE 6423415 has very low
priority for now.
It might be more reasonable to have j.l.Character.toTitleCaseCharArray()
instead of j.l.String.toTitleCase().
-Sherman
On 4/14/2011 7:49 PM, Tom Christiansen wrote:
> Sherman,
>
> While I was fixing your docs for j.l.Character, I kept the Unicode
> 6.0 specs close at hand to make sure everything was up to date. That's
> how I was able to discover that one could safely update this comment
> that noted that 1:M uppercasings happen only in the BMP:
>
> - // As of Unicode 4.0, 1:M uppercasings only happen in the BMP.
> + // As of Unicode 6.0, 1:M uppercasings only happen in the BMP.
>
> I was very careful not to touch any code whatsoever--much though I
> wanted to. :)
>
> You see, you've got an obvious bug in that you have only a
> toUpperCaseCharArray method to handle the full case mappings from
> Unicode SpecialCasing.txt file. Clearly absent are corresponding
> methods for the other two cases, lower and title:
>
> toLowerCaseCharArray
> toTitleCaseCharArray
>
> This is a serious problem, because it means you grant access to full
> Unicode casing for only one the three mappings. And it is not as though
> there is anything in String that will take care of this, either! I was
> shocked that there is no String#toTitleCase method. And I'm mistrustful
> of the String#toLowerCase method, since there is no toLowerCaseCharArray
> method in j.l.Character for it to access. So what does it do about
> lowercasing this code point:
>
> İ U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE
>
> As you know, the lowercase for that code point is the two-
> character string, "i\x{307}" (that is, "i\N{COMBINING DOT
> ABOVE}"), and this is true no matter the locale; see
> SpecialCasing.txt.
>
> Here are the respective number of code points in Unicode that
> have multichar mappings, the thing that is called "full" case
> mapping in Unicode:
>
> 1 lowercase
> 42 titlecase
> 102 uppercase
>
> It's not really the *number* of code points affected that is the trouble.
>
> Rather, it is Java's complete inability to access them. It's like having a
> "small" race condition. It's a hole in the spec. There really is no
> reason to support full casing mapping only for uppercase but refuse it for
> the other two casings. This is a very non-parallel situation, and I do not
> understand why it exists; it should not.
>
> Sherman, could you please file this bug report so that it gets to the right
> queue, and then tell me its bug number? Maybe this is something that could
> be fixed in a future JDK8 project sometime.
>
> --tom
More information about the i18n-dev
mailing list