review request for 6798511/6860431: Include functionality of Surrogate in Character
Ulf Zibis
Ulf.Zibis at gmx.de
Wed Mar 17 00:46:54 UTC 2010
Am 17.03.2010 00:41, schrieb Martin Buchholz:
> On Tue, Mar 16, 2010 at 16:14, Ulf Zibis<Ulf.Zibis at gmx.de> wrote:
>
>> Am 16.03.2010 22:36, schrieb Martin Buchholz:
>>
>> On Tue, Mar 16, 2010 at 13:58, Ulf Zibis<Ulf.Zibis at gmx.de> wrote:
>>
>>
>>
>> Additionally, toUpperCaseCharArray(), codePointCountImpl(), String(int[],
>> int, int) would profit from consecutive use of isBMPCodePoint +
>> isSupplementaryCodePoint() or isHighSurrogate() + isLowSurrogate.
>>
>>
>> For codePointCountImpl(), I do not agree.
>>
>>
>> 1-byte comparisons have less footprint, in doubt load faster from memory,
>> need less L1-CPU-cache, on small/RISC/etc. CPU's would be faster and
>> therefore should enhance overall performance.
>> The shift additionally could be omitted on CPU's which can benefit from
>> 6933327.
>>
1) I agree, this is academical.
2) should better be optimized by VM, but isn't at this time see:
Just filed, no ID yet: - Transform comparisons against odd border to
even border
(Review ID: 1735166) - Use as less bits as necessary
3) didn't you say, we should write code without referring on VM vendor
specific optimizations
4) Regardless the 8-bit/32-bit arguments, if we subtract 0xd800/0xdc00,
I guess, we could benefit from 6932837
<http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6932837> - Better
use unsigned jump if one of the range limits is 0
for (int i = offset; i < endIndex; ) {
n++;
byte highByte = (byte)((a[i++] >>> 8) - 0xd8);
if (highByte >= 0 && highByte < 0x4) {
if (i < endIndex && (highByte = (byte)((a[i] >>> 8) -
0xdc)) >= 0 && highByte < 0x4) {
i++;
}
}
}
> I am not convinced. Using byte for local variables is unlikely to
> give any performance benefit. The only way use of byte can be
> a win is if you read/write a bunch of them at once from memory.
> I think of byte as a compression scheme for int.
>
>
>> For String(int[], int, int), I do agree.
>>
>> Here is my latest more readable and more performant implementation:
>>
>> int end = offset + count;
>>
>> // Pass 1: Compute precise size of char[]
>> int n = 0;
>> for (int i = offset; i< end; i++) {
>> int c = codePoints[i];
>> if (Character.isBMPCodePoint(c))
>> n += 1;
>> else if (Character.isSupplementaryCodePoint(c))
>> n += 2;
>> else throw new IllegalArgumentException(Integer.toString(c));
>> }
>>
>> // Pass 2: Allocate and fill in char[]
>> char[] v = new char[n];
>> for (int i = offset, j = 0; i< end; i++) {
>> int c = codePoints[i];
>> if (Character.isBMPCodePoint(c)) {
>> v[j++] = (char) c;
>> } else {
>> Character.toSurrogates(c, v, j);
>> j += 2;
>> }
>> }
>>
>>
>> I suggest:
>>
>> // Pass 2: Allocate and fill in char[]
>> char[] v = new char[n];
>> for (int i = end; n> 0; ) {
>> int c = codePoints[--i];
>> if (Character.isBMPCodePoint(c))
>> v[--n] = (char)c;
>> else
>> Character.toSurrogates(c, v, n -= 2);
>> }
>>
>> - saves 1 variable (=reduces register pressure)
>> - determining of the loop end against 0 is faster than against "end", see:
>> 6932855
>>
> Perhaps, but this exceeds my micro-optimization threshold.
>
:-(
-Ulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20100317/606db77c/attachment.html>
More information about the core-libs-dev
mailing list