Unsafe.{get,put}-X-Unaligned performance
Peter Levart
peter.levart at gmail.com
Thu Mar 12 19:29:19 UTC 2015
On 03/12/2015 07:37 PM, Andrew Haley wrote:
> On 03/12/2015 05:15 PM, Peter Levart wrote:
>> ...or are JIT+CPU smart enough and there would be no difference?
> C2 always orders things based on profile counts, so there is no
> difference. Your suggestion would be better for interpreted code
> and I guess C1 also, so I agree it is worthwhile.
>
> Thanks,
> Andrew.
>
What about the following variant (or similar with ifs in case switch is
sub-optimal):
public final long getLongUnaligned(Object o, long offset) {
switch ((int) offset & 7) {
case 1:
case 5: return
(toUnsignedLong(getByte(o, offset)) << pickPos(56, 0)) |
(toUnsignedLong(getShort(o, offset + 1)) << pickPos(48,
8)) |
(toUnsignedLong(getInt(o, offset + 3)) << pickPos(32,
24)) |
(toUnsignedLong(getByte(o, offset + 7)) << pickPos(56,
56));
case 2:
case 6: return
(toUnsignedLong(getShort(o, offset)) << pickPos(48, 0)) |
(toUnsignedLong(getInt(o, offset + 2)) << pickPos(32,
16)) |
(toUnsignedLong(getShort(o, offset + 6)) << pickPos(48,
48));
case 3:
case 7: return
(toUnsignedLong(getByte(o, offset)) << pickPos(56, 0)) |
(toUnsignedLong(getInt(o, offset + 1)) << pickPos(32, 8)) |
(toUnsignedLong(getShort(o, offset + 5)) << pickPos(48,
40)) |
(toUnsignedLong(getByte(o, offset + 7)) << pickPos(56,
56));
case 4: return
(toUnsignedLong(getInt(o, offset)) << pickPos(32, 0)) |
(toUnsignedLong(getInt(o, offset + 4)) << pickPos(32, 32));
case 0:
default: return
getLong(o, offset);
}
}
...it may have more branches, but less instructions in average per call.
Peter
More information about the core-libs-dev
mailing list