review request for 6798511/6860431: Include functionality of Surrogate in Character

Martin Buchholz martinrb at google.com
Thu Aug 27 01:24:27 UTC 2009


On Wed, Aug 26, 2009 at 02:59, Ulf Zibis<Ulf.Zibis at gmx.de> wrote:
> Hi all,
>
> I have put all important things of sun.nio.cs.Surrogate to java.* packages,
> I guess. We likely could get rid of it.

I am in principle of adding the method isBMP to Character.java
(I did write it!) but I was hoping we can find a better name for it...
Hmmmmm..... The Unicode Glossary uses BMP consistently,
so maybe isBMP is the best we can come up with?

How about isBMPCodePoint?

http://unicode.org/glossary/#BMP_code_point

Yeah, I think that's not too bad.

Martin


> See: https://bugs.openjdk.java.net/show_bug.cgi?id=100104
>
>
> Some comments:
> ================
>    public static int toChars(int codePoint, char[] dst, int dstIndex) {
> -        if (codePoint < 0 || codePoint > MAX_CODE_POINT) {
> -            throw new IllegalArgumentException();
> -        }
> -        if (codePoint < MIN_SUPPLEMENTARY_CODE_POINT) {
> -            dst[dstIndex] = (char) codePoint;
> -            return 1;
> -        }
> -        toSurrogates(codePoint, dst, dstIndex);
> +        if (codePoint >= MIN_CODE_POINT) {
> +            if (codePoint < MIN_SUPPLEMENTARY_CODE_POINT) {
> +                dst[dstIndex] = (char)codePoint;
> +                return 1;
> +            }
> +            if (codePoint <= MAX_CODE_POINT)
> +                return toSurrogates(codePoint, dst, dstIndex);
> +        }
> +        throw new IllegalArgumentException();
> +    }
> ----------------
> saves 1 compare in case of code point is in BMP

Yes.  A good change.

> ================
> +    public static char[] toChars(int codePoint) {
> +        if (codePoint >= MIN_CODE_POINT) {
> +            char[] result;
> +            if (codePoint < MIN_SUPPLEMENTARY_CODE_POINT) {
> +                result = C1.clone();
> ----------------
> I guess cloning is faster than new instantiation, as fresh one has to be
> initialized by 0s, but not sure

Cloning has to initialize as well, with the original contents.
Often there is special support for zeroing newly allocated objects.
So I think this is a bad idea.

> ================
> +    static int toSurrogates(int codePoint, char[] dst, int index) {
> +        // We write elements "backwards" to guarantee all-or-nothing //
> TODO: isn't forward faster ?
> +        dst[index+1] = lowSurrogate(codePoint);
> +        dst[index] = highSurrogate(codePoint);
> ----------------
> IMO this guarantee is nowhere needed, so why potentially waste performance ?

All-or-nothing is a fundamental software reliability principle.
Why have your data structure in a corrupted state
if it is not too onerous to avoid it?
The difference in behavior is certainly user-visible.

> ================
> -        // Pass 2: Allocate and fill in char[]
> -        char[] v = new char[n];
> -        for (int i = offset, j = 0; i < offset + count; i++) {
> -            int c = codePoints[i];
> -            if (c < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
> -                v[j++] = (char) c;
> -            } else {
> -                Character.toSurrogates(c, v, j);
> -                j += 2;
> -            }
> -        }
> -
> -        this.value  = v;
> +        // Pass 2: Allocate and fill in char[]
> +        for (int i = offset, j = 0; i < offset + count; i++) {
> +            int c = codePoints[i];
> +            if (c < Character.MIN_SUPPLEMENTARY_CODE_POINT)
> +                value[j++] = (char)c;
> +            else
> +                j += Character.toSurrogates(c, value, j);
>        }
> ----------------
> I guess HotSpot is aware of immutability of String, so no need to locally
> buffer v expletive to save indirection and against concurrent modifications.

I like the simplification of this method a lot, but it actually helps
hotspot a little
to hold the new array in a local and assign it to a field at the end
of the constructor.
This sort of thing (copying a field into a local) is done in
java.util.concurrent a lot.

---

It would help to separate cosmetic and "real" changes to get them
reviewed and accepted.


Martin



More information about the core-libs-dev mailing list