Unsafe.{get,put}-X-Unaligned performance

Peter Levart peter.levart at gmail.com
Thu Mar 12 19:15:30 UTC 2015



On 03/12/2015 07:16 PM, Vitaly Davidovich wrote:
> Right, ok -- just wanted to make sure I wasn't missing something.  For 
> platforms that don't support unaligned access, is it expected that 
> callers will be reading/writing addresses that are unaligned to the 
> size of the type they're reading? My hunch is that on such platforms 
> folks would tend to align their data layouts so as to avoid unaligned 
> operations, in which case checking for "natural" alignment first makes 
> sense. But I don't know if that's actually true or not.

It depends on usage yes. But "Java" is a platform-independent "Platform" 
and these Unsafe methods are meant to abstract-away the platform dependency.

Peter

>
> On Thu, Mar 12, 2015 at 2:07 PM, Peter Levart <peter.levart at gmail.com 
> <mailto:peter.levart at gmail.com>> wrote:
>
>
>
>     On 03/12/2015 06:30 PM, Vitaly Davidovich wrote:
>>     Isn't the C2 intrinsic just reading the value starting at the
>>     specified offset directly (when unaligned access is supported)
>>     and not doing the branching?
>
>     It is. This code is for those platforms not supporting unaligned
>     accesses.
>
>     Peter
>
>
>>
>>     On Thu, Mar 12, 2015 at 1:15 PM, Peter Levart
>>     <peter.levart at gmail.com <mailto:peter.levart at gmail.com>> wrote:
>>
>>
>>
>>         On 03/10/2015 08:02 PM, Andrew Haley wrote:
>>>         The new algorithm does an N-way branch, always loading and storing
>>>         subwords according to their natural alignment.  So, if the address is
>>>         random and the size is long it will access 8 bytes 50% of the time, 4
>>>         shorts 25% of the time, 2 ints 12.5% of the time, and 1 long 12.5% of
>>>         the time.  So, for every random load/store we have a 4-way branch.
>>
>>
>>         ...so do you think it would be better if the order of checks
>>         in if/else chain:
>>
>>          972     public final long getLongUnaligned(Object o, long
>>         offset) {
>>          973         if ((offset & 7) == 0) {
>>          974             return getLong(o, offset);
>>          975         } else if ((offset & 3) == 0) {
>>          976             return makeLong(getInt(o, offset),
>>          977                             getInt(o, offset + 4));
>>          978         } else if ((offset & 1) == 0) {
>>          979             return makeLong(getShort(o, offset),
>>          980                             getShort(o, offset + 2),
>>          981                             getShort(o, offset + 4),
>>          982                             getShort(o, offset + 6));
>>          983         } else {
>>          984             return makeLong(getByte(o, offset),
>>          985                             getByte(o, offset + 1),
>>          986                             getByte(o, offset + 2),
>>          987                             getByte(o, offset + 3),
>>          988                             getByte(o, offset + 4),
>>          989                             getByte(o, offset + 5),
>>          990                             getByte(o, offset + 6),
>>          991                             getByte(o, offset + 7));
>>          992         }
>>          993     }
>>
>>
>>         ...was reversed:
>>
>>         if ((offset & 1) == 1) {
>>             // bytes
>>         } else if ((offset & 2) == 2) {
>>             // shorts
>>         } else if ((offset & 4) == 4) {
>>             // ints
>>         } else {
>>             // longs
>>         }
>>
>>
>>         ...or are JIT+CPU smart enough and there would be no difference?
>>
>>
>>         Peter
>>
>>
>
>




More information about the core-libs-dev mailing list