Unicode script support in Regex and Character class
Xueming Shen
xueming.shen at oracle.com
Thu Apr 22 08:01:20 UTC 2010
Hi,
Here is the webrev of the proposal to add Unicode script support in
regex and j.l.Character.
http://cr.openjdk.java.net/~sherman/script/webrev
and the corresponding blenderrev
http://cr.openjdk.java.net/~sherman/script/blenderrev.html
Please comment on the APIs before I submit the CCC, especially
(1) to use enum for the j.l.Character.UnicodeScript (compared to the
traditional j.l.c.Subset)
(2) the piggyback method j.l.c.getName() :-)
(3) the syntax for script constructs. In addition to the "normal"
\p{InScriptName} and \P{InScriptName} for the script support
I'm also adding
\p{script=ScriptName} \P{script=ScriptName} for the new script support
\p{block=BlockName} \P{block=BlockName} for the "existing" block
support
\p{general_category=CategoryName} \P{general_category=CategoryName}
for the "existing" gc
Perl recently also started to accept this \p{propName=propValue}
Unicode style.
It opens the door for future "expanding", for example \p{name=XYZ} :-)
(4)and of course, the wording.
Thanks,
Sherman
More information about the core-libs-dev
mailing list