review request for 6798511/6860431: Include functionality of Surrogate in Character

Ulf Zibis Ulf.Zibis at gmx.de
Tue Mar 16 23:14:08 UTC 2010


Am 16.03.2010 22:36, schrieb Martin Buchholz:
> On Tue, Mar 16, 2010 at 13:58, Ulf Zibis<Ulf.Zibis at gmx.de>  wrote:
>
>    
>> Additionally, toUpperCaseCharArray(), codePointCountImpl(), String(int[],
>> int, int) would profit from consecutive use of isBMPCodePoint +
>> isSupplementaryCodePoint() or isHighSurrogate() + isLowSurrogate.
>>      
> For codePointCountImpl(), I do not agree.
>    

1-byte comparisons have less footprint, in doubt load faster from 
memory, need less L1-CPU-cache, on small/RISC/etc. CPU's would be faster 
and therefore should enhance overall performance.
The shift additionally could be omitted on CPU's which can benefit from 
6933327.

> For String(int[], int, int), I do agree.
>
> Here is my latest more readable and more performant implementation:
>
>          int end = offset + count;
>
>          // Pass 1: Compute precise size of char[]
>          int n = 0;
>          for (int i = offset; i<  end; i++) {
>              int c = codePoints[i];
>              if (Character.isBMPCodePoint(c))
>                  n += 1;
>              else if (Character.isSupplementaryCodePoint(c))
>                  n += 2;
>              else throw new IllegalArgumentException(Integer.toString(c));
>          }
>
>          // Pass 2: Allocate and fill in char[]
>          char[] v = new char[n];
>          for (int i = offset, j = 0; i<  end; i++) {
>              int c = codePoints[i];
>              if (Character.isBMPCodePoint(c)) {
>                  v[j++] = (char) c;
>              } else {
>                  Character.toSurrogates(c, v, j);
>                  j += 2;
>              }
>          }
>    

I suggest:

         // Pass 2: Allocate and fill in char[]
         char[] v = new char[n];
         for (int i = end; n > 0; ) {
             int c = codePoints[--i];
             if (Character.isBMPCodePoint(c))
                 v[--n] = (char)c;
             else
                 Character.toSurrogates(c, v, n -= 2);
         }

- saves 1 variable (=reduces register pressure)
- determining of the loop end against 0 is faster than against "end", 
see: 6932855 <http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6932855>
BTW:
     int end = offset + count;
could be saved, as VM would do that, for sure in HotSpot c2 compiler.

-Ulf

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20100317/01d9992e/attachment.html>


More information about the core-libs-dev mailing list