review request for 6798511/6860431: Include functionality of Surrogate in Character

Martin Buchholz martinrb at google.com
Tue Mar 16 23:41:08 UTC 2010


On Tue, Mar 16, 2010 at 16:14, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
> Am 16.03.2010 22:36, schrieb Martin Buchholz:
>
> On Tue, Mar 16, 2010 at 13:58, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
>
>
>
> Additionally, toUpperCaseCharArray(), codePointCountImpl(), String(int[],
> int, int) would profit from consecutive use of isBMPCodePoint +
> isSupplementaryCodePoint() or isHighSurrogate() + isLowSurrogate.
>
>
> For codePointCountImpl(), I do not agree.
>
>
> 1-byte comparisons have less footprint, in doubt load faster from memory,
> need less L1-CPU-cache, on small/RISC/etc. CPU's would be faster and
> therefore should enhance overall performance.
> The shift additionally could be omitted on CPU's which can benefit from
> 6933327.

I am not convinced.  Using byte for local variables is unlikely to
give any performance benefit.  The only way use of byte can be
a win is if you read/write a bunch of them at once from memory.
I think of byte as a compression scheme for int.

> For String(int[], int, int), I do agree.
>
> Here is my latest more readable and more performant implementation:
>
>         int end = offset + count;
>
>         // Pass 1: Compute precise size of char[]
>         int n = 0;
>         for (int i = offset; i < end; i++) {
>             int c = codePoints[i];
>             if (Character.isBMPCodePoint(c))
>                 n += 1;
>             else if (Character.isSupplementaryCodePoint(c))
>                 n += 2;
>             else throw new IllegalArgumentException(Integer.toString(c));
>         }
>
>         // Pass 2: Allocate and fill in char[]
>         char[] v = new char[n];
>         for (int i = offset, j = 0; i < end; i++) {
>             int c = codePoints[i];
>             if (Character.isBMPCodePoint(c)) {
>                 v[j++] = (char) c;
>             } else {
>                 Character.toSurrogates(c, v, j);
>                 j += 2;
>             }
>         }
>
>
> I suggest:
>
>         // Pass 2: Allocate and fill in char[]
>         char[] v = new char[n];
>         for (int i = end; n > 0; ) {
>             int c = codePoints[--i];
>             if (Character.isBMPCodePoint(c))
>                 v[--n] = (char)c;
>             else
>                 Character.toSurrogates(c, v, n -= 2);
>         }
>
> - saves 1 variable (=reduces register pressure)
> - determining of the loop end against 0 is faster than against "end", see:
> 6932855

Perhaps, but this exceeds my micro-optimization threshold.

> BTW:
>     int end = offset + count;
> could be saved, as VM would do that, for sure in HotSpot c2 compiler.
>
> -Ulf
>
>

Martin



More information about the core-libs-dev mailing list