Unicode script support in Regex and Character class

Xueming Shen xueming.shen at oracle.com
Mon Apr 26 22:01:54 UTC 2010


Ulf Zibis wrote:
> Am 26.04.2010 07:28, schrieb Xueming Shen:
>>
>> Can I assume we are all OK with at least the API part of the latest 
>> webrev/blenderrev of
>> the script support in j.l.Character and j.u.r.Pattern, including the 
>> j.l.Chareacter.getName().
>
> I guess you mean:
>     public static enum UnicodeScript {
>         COMMON,
>         ...;
>         public static UnicodeScript of(int codePoint);
>         public static final UnicodeScript forName(String scriptName);
>     }
>     public static String getName(int codePoint);
>
> I'm ok with this api on enum base.
>
> I would like to see the full names redundantly in the aliases map. 
> Needs only ~100 * (4 + 4) bytes in HashMap<String, Character.
This is the implementation details, we can defer the difference for now.

> UnicodeScript>.
> I think there should be some more words in the javadoc about 
> correlation/usecase/advantage of UnicodeScript against against 
> UnicodeBlock.

Martin raised the same comment. But I still believe j.l.C.UnicodeScript 
simply defines the syntax of the Unicode script name
in the Java libraries, it does not try to interpret/implement anything 
further at semantics level. It just serves as a ID to the
Unicode script name, so it'd be better to leave the semantics 
definition/explanation to the TR#24.


> I would like to have the 3 special cases INHERITED, COMMON and UNKNOWN 
> together at the beginning or end of the enum list.

Why?  Since the current list is generated by the script from the 
Scripts.txt, it's in the order of what
they are in the Scripts.txt, any particular reason they should be listed 
differently? We do have the
links at the beginning already. I don't see any advantage of putting 
them physically together.

-Sherman



More information about the core-libs-dev mailing list