<i18n dev> RL1.1 Hex Notation

Wed Jan 26 13:36:51 PST 2011

Ok, now I understand. With that change, the situation is much better. It
doesn't fully satisfy RL1.1, because you can't use hex codepoint numbers --
you have to use the fairly ugly workaround of

      String hexPattern = codePoint <= 0xFFFF

? String.format("\\u%04x", codePoint)

: String.format("\\u%04x\\u%04x", (int) Character.toChars(codePoint)[0], (
int) Character.toChars(codePoint)[1]);

BTW, in plain Java I really miss a few of the ICU4J routines, like:

   - char c1 = UTF16.getLeadSurrogate(codePoint);
   - char c2 = UTF16.getLeadSurrogate(codePoint);
   - String s = UTF16.valueOf(codePoint);

You can do them in plain Java, as in the above expression, but they're
awkward and not as clear to read. And instead of the third one, the best I
see in plain Java is the following, which is really pretty ugly (is there
any better way?).

   String s = new StringBuilder().appendCodePoint(codePoint).toString();

Mark

*— Il meglio è l’inimico del bene —*

On Wed, Jan 26, 2011 at 12:47, Xueming Shen <xueming.shen at oracle.com> wrote:

> Oh, I see the problem. Obviously I have been working on jdk7 too long and
> forgot the
> latest release is still 6:-( There is indeed a bug in the previous
> implementation which I
> fixed in 7 long time ago (I mentioned this in one of the early emails but
> was not specific,
> my apology), probably should backport to 6 update release asap. The test
> case runs well
> (the "failures" in literals are expected) on 7 with the following output. I
> modified your test
> case "slightly" since it appears the UnicodeSet class in our normalizer
> package does not
> have the size(), replace it with a normal hashset.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/i18n-dev/attachments/20110126/88593c77/attachment-0001.html