<i18n dev> Java Regexes vs Unicode Regexes
Xueming Shen
xueming.shen at oracle.com
Thu Jan 20 13:53:01 PST 2011
On 01/20/2011 12:55 PM, Tom Christiansen wrote:
> Sherman wrote:
>
>> At the end, Java RegEx is NOT a Unicode RegEx, while it
>> supports Unicode RegEx at certain level, sometime via different
>> syntax, I don't feel this is a big problem for most Java
>> developers and should not be a stopper for most program.
> I do not understand what you mean when you say that Java regexes
> aren't Unicode regexes. Are you referring to the various
> syntactic features of UTS 18, Unicode Regular Expressions?
> If so, it's my understanding that many of those are examples
> only, especially when it comes to how something actually looks.
>
> I fully agree with you that Java indeed offers some of the
> functionality described there in other ways than given by those
> particular examples, and that quite often this doesn't make
> enough practical difference as to be a show-stopper. I discuss
> this further later on down in this message.
>
> Another possible interpretation of:
>
>> Java RegEx is NOT a Unicode RegEx, while it
>> supports Unicode RegEx at certain level,
> is that you are saying that the standard Java regex
> class does not provide the baseline Level 1 Unicode
> support spelled out in UTS#18, then I'm afraid you
> are again correct.
>
> However, I would very much like to see this fixed. That's
> because Level 1 support is the absolute mimimum level required
> for useful Unicode support. To quote from UTS#18:
Hi Tom,
That is NOT what I'm saying.
The Java RegEx is supposed to be "in conformance with level 1 of UTS#18
plus RL2.1
Canonical Equivalents", so anything defined in UTS#18 level one should
be supported
by Java RegEx, though might not be the exact same syntax
defined/recommended by
UTS#18 or just work out of the box, for example the Unicode case
insensitive match,
you will have to specify a particular "flag" to turn it on, basically
for performance reason.
Really appreciate if you can provide the details of what is missing out
for the level one
support, given that would be a specification broken I definitely can put
it on high priority
list to work on. The script support is one of the level one request that
we don't have it in
our latest release, but I have added it in the up coming jdk7. I'm sure
there are bugs
and corner cases here and there even we have lots of tests supposedly to
cover everything:-)
Had been dedicatedly working on Java I18n for years, so I fully
understand how important
the Unicode is, especially for Java as the platform. And it's our goal
to have java provide
the most useful Unicode support, it would be the last thing for me to
say "go pick other
language/platform". No, I don't feel any offense at all. In fact we are
really appreciated
these useful comments, suggestions, expertise, which will definitely
help evolve the platform.
-Sherman
More information about the i18n-dev
mailing list