Unicode script support in Regex and Character class
Xueming Shen
xueming.shen at oracle.com
Mon Apr 26 22:01:54 UTC 2010
Ulf Zibis wrote:
> Am 26.04.2010 07:28, schrieb Xueming Shen:
>>
>> Can I assume we are all OK with at least the API part of the latest
>> webrev/blenderrev of
>> the script support in j.l.Character and j.u.r.Pattern, including the
>> j.l.Chareacter.getName().
>
> I guess you mean:
> public static enum UnicodeScript {
> COMMON,
> ...;
> public static UnicodeScript of(int codePoint);
> public static final UnicodeScript forName(String scriptName);
> }
> public static String getName(int codePoint);
>
> I'm ok with this api on enum base.
>
> I would like to see the full names redundantly in the aliases map.
> Needs only ~100 * (4 + 4) bytes in HashMap<String, Character.
This is the implementation details, we can defer the difference for now.
> UnicodeScript>.
> I think there should be some more words in the javadoc about
> correlation/usecase/advantage of UnicodeScript against against
> UnicodeBlock.
Martin raised the same comment. But I still believe j.l.C.UnicodeScript
simply defines the syntax of the Unicode script name
in the Java libraries, it does not try to interpret/implement anything
further at semantics level. It just serves as a ID to the
Unicode script name, so it'd be better to leave the semantics
definition/explanation to the TR#24.
> I would like to have the 3 special cases INHERITED, COMMON and UNKNOWN
> together at the beginning or end of the enum list.
Why? Since the current list is generated by the script from the
Scripts.txt, it's in the order of what
they are in the Scripts.txt, any particular reason they should be listed
differently? We do have the
links at the beginning already. I don't see any advantage of putting
them physically together.
-Sherman
More information about the core-libs-dev
mailing list