Please review: surrogate fiddle

Ulf Zibis Ulf.Zibis at CoSoCo.de
Thu Mar 21 12:29:36 UTC 2013


Am 21.03.2013 00:22, schrieb Martin Buchholz:
> On Wed, Mar 20, 2013 at 7:09 AM, Ulf Zibis <Ulf.Zibis at cosoco.de <mailto:Ulf.Zibis at cosoco.de>> wrote:
>
>     AbstractStringBuilder:
>     Instead
>      270     public int codePointBefore(int index) {
>      271         int i = index - 1;
>      272         if ((i < 0) || (i >= count)) {
>      273             throw new StringIndexOutOfBoundsException(index);
>      274         }
>     I suggest
>      270     public int codePointBefore(int index) {
>      271         if ((--index < 0) || (index >= count)) {
>      272             throw new StringIndexOutOfBoundsException(index);
>      273         }
>     , because if e.g. the initial value of index is 0, then -1 reflects the out-of-bound
>     condition, but not the initial 0 to report in the StringIndexOutOfBoundsException.
>     (Hopefully the following redundant i < 0 bounds check from value[i] becomes elimited by JIT.)
>
>
> I was impressed that hotspot could indeed remove the redundant bounds checks.
> Only with -Xint was I able to demonstrate a performance win.

Thanks!
But what's about the "wrong" exception index reporting, e.g. 0 as "out of bounds"?


>     If there is some register pressure, you could avoid potential register swapping for temp,
>     temp2, hasSurrogate, j and n - j if you would reorder following lines to:
>     1390             char temp = value[j];
>     1391             char temp2 = value[j] = value[n - j];
>     1397             value[n - j] = temp;
>     1392             if (!hasSurrogate) {
>     1393                 hasSurrogate = (Character.isSurrogate(temp) ||
>     1394 Character.isSurrogate(temp2));
>     1395             }
>     (Nomination for "performance expert II" ?)
>
>
> Thanks for making me beat my head against the hotspot optimization wall!
> The latest version of reverse in my webrev, after way too much fiddling, is 20% faster,
> even though I'm not quite sure why.

No surprise to me. In ordinary strings (no surrogate pairs), always 3 checks should be processed 
each loop:
- if (!hasSurrogate)
- Character.isSurrogate(temp)
- Character.isSurrogate(temp2)

The "if (!hasSurrogate)" only helps a little, if a surrogate pair was found in 1st half of the 
string, but the gain should be small in relation to the extra work, scanning the string again for 
reversing them.

Did you try the more smart syntax? :
     hasSurrogate =
         (Character.isSurrogate(cj) || Character.isSurrogate(ck));


> If you write tests, I will incorporate into this changeset!

I have no java development environment here and small time, so depends on your patience to commit 
the changeset.

-Ulf





More information about the core-libs-dev mailing list