Unsafe.{get,put}-X-Unaligned performance
Peter Levart
peter.levart at gmail.com
Thu Mar 12 21:04:50 UTC 2015
On 03/12/2015 08:29 PM, Peter Levart wrote:
>
>
> On 03/12/2015 07:37 PM, Andrew Haley wrote:
>> On 03/12/2015 05:15 PM, Peter Levart wrote:
>>> ...or are JIT+CPU smart enough and there would be no difference?
>> C2 always orders things based on profile counts, so there is no
>> difference. Your suggestion would be better for interpreted code
>> and I guess C1 also, so I agree it is worthwhile.
>>
>> Thanks,
>> Andrew.
>>
>
> What about the following variant (or similar with ifs in case switch
> is sub-optimal):
>
> public final long getLongUnaligned(Object o, long offset) {
> switch ((int) offset & 7) {
> case 1:
> case 5: return
> (toUnsignedLong(getByte(o, offset)) << pickPos(56, 0)) |
> (toUnsignedLong(getShort(o, offset + 1)) <<
> pickPos(48, 8)) |
> (toUnsignedLong(getInt(o, offset + 3)) << pickPos(32,
> 24)) |
> (toUnsignedLong(getByte(o, offset + 7)) << pickPos(56,
> 56));
> case 2:
> case 6: return
> (toUnsignedLong(getShort(o, offset)) << pickPos(48, 0)) |
> (toUnsignedLong(getInt(o, offset + 2)) << pickPos(32,
> 16)) |
> (toUnsignedLong(getShort(o, offset + 6)) <<
> pickPos(48, 48));
> case 3:
> case 7: return
> (toUnsignedLong(getByte(o, offset)) << pickPos(56, 0)) |
> (toUnsignedLong(getInt(o, offset + 1)) << pickPos(32,
> 8)) |
> (toUnsignedLong(getShort(o, offset + 5)) <<
> pickPos(48, 40)) |
> (toUnsignedLong(getByte(o, offset + 7)) << pickPos(56,
> 56));
> case 4: return
> (toUnsignedLong(getInt(o, offset)) << pickPos(32, 0)) |
> (toUnsignedLong(getInt(o, offset + 4)) << pickPos(32,
> 32));
> case 0:
> default: return
> getLong(o, offset);
> }
> }
>
>
> ...it may have more branches, but less instructions in average per call.
>
>
>
> Peter
>
... putLongUnaligned in the style of above getLongUnaligned is more
tricky with current code structure. But there may be a middle ground (or
a sweet spot):
public final void putLongUnaligned(Object o, long offset, long x) {
if (((int) offset & 1) == 1) {
putLongParts(o, offset,
(byte) (x >>> 0),
(short) (x >>> 8),
(short) (x >>> 24),
(short) (x >>> 40),
(byte) (x >>> 56));
} else if (((int) offset & 2) == 2) {
putLongParts(o, offset,
(short)(x >>> 0),
(int)(x >>> 16),
(short)(x >>> 48));
} else if (((int) offset & 4) == 4) {
putLongParts(o, offset,
(int)(x >> 0),
(int)(x >>> 32));
} else {
putLong(o, offset, x);
}
}
...this has the same number of branches, but less instructions. You also
need the following two:
private void putLongParts(Object o, long offset, byte i0, short
i12, short i34, short i56, byte i7) {
putByte(o, offset + 0, pick(i0, i7));
putShort(o, offset + 1, pick(i12, i56));
putShort(o, offset + 3, i34);
putShort(o, offset + 5, pick(i56, i12));
putByte(o, offset + 7, pick(i7, i0));
}
private void putLongParts(Object o, long offset, short i0, int i12,
short i3) {
putShort(o, offset + 0, pick(i0, i3));
putInt(o, offset + 2, i12);
putShort(o, offset + 6, pick(i3, i0));
}
Regards, Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150312/1a96ed5f/attachment.html>
More information about the hotspot-compiler-dev
mailing list