<i18n dev> java.lang.Character lacuna #2 of 2

Xueming Shen xueming.shen at oracle.com
Thu Apr 14 20:29:32 PDT 2011


Tom,

Welcome back:-) Have you seen that cool \x{h...h}? oh, you saw it:-)

Yes, It might be desirable to have a corresponding 
getCodePointFromName(String name), at least
I will need that when I do \N{unicode_name} in regex, but I'm not sure 
if it is worth to make it a method
into j.l.Character or just keep it as an implementation details in 
j.u.regex. I believe it's cool/fun and
went ahead to put j.l.Character.getName() into jdk7, but I'm sure 
someone might not be convinced
that whether or not this method is really useful enough for "normal" 
developer. I'm also a little
worried that the Unicode Standard keeps adding new characters each/every 
release, so the data
file will get bigger and bigger. I managed to have the name data file 
around 108k for 6.0, I hope the
7.0 is not going to be too big.

I will go through you other emails and file corresponding CR later, and 
see if we can get in several
easy doc re-word/typo changes. It might be too late for JDK7 for those 
that might be categorized as
"API change". I'm still struggling with a nasty race condition bug in my 
zip area, which I hope and need
to close before next Monday, so forgive me if I suddenly go silence for 
couple days.

-Sherman


On 04-14-2011 7:51 PM, Tom Christiansen wrote:
> Sherman,
>
> The other code thing that I saw, but also of course did not fix given
> where you are in the release cycle, was another of these mysterious
> non-parallel things.  You have a
>
>     String getName(int codePoint)
>
> function (well, static method) which takes a code point (like U+0130) and
> produces a string ("LATIN CAPITAL LETTER I WITH DOT ABOVE").  But you have
> no corresponding inverse function!  You have a get name from char but no
> get char from name.
>
> Have I maybe missed something?  Is it somewhere I didn't notice?
>
> This is important so that people stop having to put ugly magic
> numbers in their source code.  Which do you prefer, eh? :)
>
>      int leftQ  = 0x2039;
>      int rightQ = 0x203A;
>
> vs:
>
>      int leftQ  = getCharFromName("SINGLE LEFT-POINTING ANGLE QUOTATION MARK");
>      int rightQ = getCharFromName("SINGLE RIGHT-POINTING ANGLE QUOTATION MARK");
>
> See
>
>      http://icu-project.org/apiref/icu4j/com/ibm/icu/lang/UCharacter.html#getCharFromName(java.lang.String)
>
> which has these nifty paired, parallel functions so you can go both ways:
>
>      static int getCharFromName(java.lang.String name)
>            Finds a Unicode code point by its most current Unicode
>            name and return its code point value.
>      static int getCharFromName1_0(java.lang.String name)
>            Find a Unicode character by its version 1.0 Unicode name and
>            return its code point value.
>      static int getCharFromNameAlias(java.lang.String name)
>            Find a Unicode character by its corrected name alias and
>            return its code point value.
>      static java.lang.String getName(int ch)
>            Returns the most current Unicode name of the argument code
>            point, or null if the character is unassigned or outside the
>            range UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not
>            have a name.
>      static java.lang.String getName(java.lang.String s, java.lang.String separator)
>            Returns the names for each of the characters in a string
>      static java.lang.String getNameAlias(int ch)
>            Returns the corrected name from NameAliases.txt if there is one.
>
> Maybe this is something you might be able to consider for JDK8.
> It's not really a bug like the other things, but it sure would
> make sense to have, and be a great convenience.
>
> Thanks a lot!
>
> --tom



More information about the i18n-dev mailing list