review request for 6798511/6860431: Include functionality of Surrogate in Character

Wed Mar 17 00:46:54 UTC 2010

Am 17.03.2010 00:41, schrieb Martin Buchholz:
> On Tue, Mar 16, 2010 at 16:14, Ulf Zibis<Ulf.Zibis at gmx.de>  wrote:
>    
>> Am 16.03.2010 22:36, schrieb Martin Buchholz:
>>
>> On Tue, Mar 16, 2010 at 13:58, Ulf Zibis<Ulf.Zibis at gmx.de>  wrote:
>>
>>
>>
>> Additionally, toUpperCaseCharArray(), codePointCountImpl(), String(int[],
>> int, int) would profit from consecutive use of isBMPCodePoint +
>> isSupplementaryCodePoint() or isHighSurrogate() + isLowSurrogate.
>>
>>
>> For codePointCountImpl(), I do not agree.
>>
>>
>> 1-byte comparisons have less footprint, in doubt load faster from memory,
>> need less L1-CPU-cache, on small/RISC/etc. CPU's would be faster and
>> therefore should enhance overall performance.
>> The shift additionally could be omitted on CPU's which can benefit from
>> 6933327.
>>      

1) I agree, this is academical.
2) should better be optimized by VM, but isn't at this time see:
Just filed, no ID yet: - Transform comparisons against odd border to 
even border
(Review ID: 1735166) - Use as less bits as necessary
3) didn't you say, we should write code without referring on VM vendor 
specific optimizations

4) Regardless the 8-bit/32-bit arguments, if we subtract 0xd800/0xdc00, 
I guess, we could benefit from 6932837 
<http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6932837> - Better 
use unsigned jump if one of the range limits is 0
         for (int i = offset; i < endIndex; ) {
             n++;
             byte highByte = (byte)((a[i++] >>> 8) - 0xd8);
             if (highByte >= 0 && highByte < 0x4) {
                 if (i < endIndex && (highByte = (byte)((a[i] >>> 8) - 
0xdc)) >= 0 && highByte < 0x4) {
                     i++;
                 }
             }
         }

> I am not convinced.  Using byte for local variables is unlikely to
> give any performance benefit.  The only way use of byte can be
> a win is if you read/write a bunch of them at once from memory.
> I think of byte as a compression scheme for int.
>
>    
>> For String(int[], int, int), I do agree.
>>
>> Here is my latest more readable and more performant implementation:
>>
>>          int end = offset + count;
>>
>>          // Pass 1: Compute precise size of char[]
>>          int n = 0;
>>          for (int i = offset; i<  end; i++) {
>>              int c = codePoints[i];
>>              if (Character.isBMPCodePoint(c))
>>                  n += 1;
>>              else if (Character.isSupplementaryCodePoint(c))
>>                  n += 2;
>>              else throw new IllegalArgumentException(Integer.toString(c));
>>          }
>>
>>          // Pass 2: Allocate and fill in char[]
>>          char[] v = new char[n];
>>          for (int i = offset, j = 0; i<  end; i++) {
>>              int c = codePoints[i];
>>              if (Character.isBMPCodePoint(c)) {
>>                  v[j++] = (char) c;
>>              } else {
>>                  Character.toSurrogates(c, v, j);
>>                  j += 2;
>>              }
>>          }
>>
>>
>> I suggest:
>>
>>          // Pass 2: Allocate and fill in char[]
>>          char[] v = new char[n];
>>          for (int i = end; n>  0; ) {
>>              int c = codePoints[--i];
>>              if (Character.isBMPCodePoint(c))
>>                  v[--n] = (char)c;
>>              else
>>                  Character.toSurrogates(c, v, n -= 2);
>>          }
>>
>> - saves 1 variable (=reduces register pressure)
>> - determining of the loop end against 0 is faster than against "end", see:
>> 6932855
>>      
> Perhaps, but this exceeds my micro-optimization threshold.
>    

:-(

-Ulf

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20100317/606db77c/attachment.html>