Unsafe.{get,put}-X-Unaligned performance

Peter Levart peter.levart at gmail.com
Thu Mar 12 21:04:50 UTC 2015



On 03/12/2015 08:29 PM, Peter Levart wrote:
>
>
> On 03/12/2015 07:37 PM, Andrew Haley wrote:
>> On 03/12/2015 05:15 PM, Peter Levart wrote:
>>> ...or are JIT+CPU smart enough and there would be no difference?
>> C2 always orders things based on profile counts, so there is no
>> difference.  Your suggestion would be better for interpreted code
>> and I guess C1 also, so I agree it is worthwhile.
>>
>> Thanks,
>> Andrew.
>>
>
> What about the following variant (or similar with ifs in case switch 
> is sub-optimal):
>
>     public final long getLongUnaligned(Object o, long offset) {
>         switch ((int) offset & 7) {
>             case 1:
>             case 5: return
>                 (toUnsignedLong(getByte(o, offset)) << pickPos(56, 0)) |
>                 (toUnsignedLong(getShort(o, offset + 1)) << 
> pickPos(48, 8)) |
>                 (toUnsignedLong(getInt(o, offset + 3)) << pickPos(32, 
> 24)) |
>                 (toUnsignedLong(getByte(o, offset + 7)) << pickPos(56, 
> 56));
>             case 2:
>             case 6: return
>                 (toUnsignedLong(getShort(o, offset)) << pickPos(48, 0)) |
>                 (toUnsignedLong(getInt(o, offset + 2)) << pickPos(32, 
> 16)) |
>                 (toUnsignedLong(getShort(o, offset + 6)) << 
> pickPos(48, 48));
>             case 3:
>             case 7: return
>                 (toUnsignedLong(getByte(o, offset)) << pickPos(56, 0)) |
>                 (toUnsignedLong(getInt(o, offset + 1)) << pickPos(32, 
> 8)) |
>                 (toUnsignedLong(getShort(o, offset + 5)) << 
> pickPos(48, 40)) |
>                 (toUnsignedLong(getByte(o, offset + 7)) << pickPos(56, 
> 56));
>             case 4: return
>                 (toUnsignedLong(getInt(o, offset)) << pickPos(32, 0)) |
>                 (toUnsignedLong(getInt(o, offset + 4)) << pickPos(32, 
> 32));
>             case 0:
>             default: return
>                 getLong(o, offset);
>         }
>     }
>
>
> ...it may have more branches, but less instructions in average per call.
>
>
>
> Peter
>

... putLongUnaligned in the style of above getLongUnaligned is more 
tricky with current code structure. But there may be a middle ground (or 
a sweet spot):


     public final void putLongUnaligned(Object o, long offset, long x) {
         if (((int) offset & 1) == 1) {
             putLongParts(o, offset,
                 (byte) (x >>> 0),
                 (short) (x >>> 8),
                 (short) (x >>> 24),
                 (short) (x >>> 40),
                 (byte) (x >>> 56));
         } else if (((int) offset & 2) == 2) {
             putLongParts(o, offset,
                 (short)(x >>> 0),
                 (int)(x >>> 16),
                 (short)(x >>> 48));
         } else if (((int) offset & 4) == 4) {
             putLongParts(o, offset,
                 (int)(x >> 0),
                 (int)(x >>> 32));
         } else {
             putLong(o, offset, x);
         }
     }


...this has the same number of branches, but less instructions. You also 
need the following two:


     private void putLongParts(Object o, long offset, byte i0, short 
i12, short i34, short i56, byte i7) {
         putByte(o, offset + 0, pick(i0, i7));
         putShort(o, offset + 1, pick(i12, i56));
         putShort(o, offset + 3, i34);
         putShort(o, offset + 5, pick(i56, i12));
         putByte(o, offset + 7, pick(i7, i0));
     }

     private void putLongParts(Object o, long offset, short i0, int i12, 
short i3) {
         putShort(o, offset + 0, pick(i0, i3));
         putInt(o, offset + 2, i12);
         putShort(o, offset + 6, pick(i3, i0));
     }



Regards, Peter

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150312/1a96ed5f/attachment.html>


More information about the hotspot-compiler-dev mailing list