Unsafe.{get,put}-X-Unaligned performance

Peter Levart peter.levart at gmail.com
Thu Mar 12 18:07:03 UTC 2015



On 03/12/2015 06:30 PM, Vitaly Davidovich wrote:
> Isn't the C2 intrinsic just reading the value starting at the 
> specified offset directly (when unaligned access is supported) and not 
> doing the branching?

It is. This code is for those platforms not supporting unaligned accesses.

Peter

>
> On Thu, Mar 12, 2015 at 1:15 PM, Peter Levart <peter.levart at gmail.com 
> <mailto:peter.levart at gmail.com>> wrote:
>
>
>
>     On 03/10/2015 08:02 PM, Andrew Haley wrote:
>>     The new algorithm does an N-way branch, always loading and storing
>>     subwords according to their natural alignment.  So, if the address is
>>     random and the size is long it will access 8 bytes 50% of the time, 4
>>     shorts 25% of the time, 2 ints 12.5% of the time, and 1 long 12.5% of
>>     the time.  So, for every random load/store we have a 4-way branch.
>
>
>     ...so do you think it would be better if the order of checks in
>     if/else chain:
>
>      972     public final long getLongUnaligned(Object o, long offset) {
>      973         if ((offset & 7) == 0) {
>      974             return getLong(o, offset);
>      975         } else if ((offset & 3) == 0) {
>      976             return makeLong(getInt(o, offset),
>      977                             getInt(o, offset + 4));
>      978         } else if ((offset & 1) == 0) {
>      979             return makeLong(getShort(o, offset),
>      980                             getShort(o, offset + 2),
>      981                             getShort(o, offset + 4),
>      982                             getShort(o, offset + 6));
>      983         } else {
>      984             return makeLong(getByte(o, offset),
>      985                             getByte(o, offset + 1),
>      986                             getByte(o, offset + 2),
>      987                             getByte(o, offset + 3),
>      988                             getByte(o, offset + 4),
>      989                             getByte(o, offset + 5),
>      990                             getByte(o, offset + 6),
>      991                             getByte(o, offset + 7));
>      992         }
>      993     }
>
>
>     ...was reversed:
>
>     if ((offset & 1) == 1) {
>         // bytes
>     } else if ((offset & 2) == 2) {
>         // shorts
>     } else if ((offset & 4) == 4) {
>         // ints
>     } else {
>         // longs
>     }
>
>
>     ...or are JIT+CPU smart enough and there would be no difference?
>
>
>     Peter
>
>




More information about the core-libs-dev mailing list