Unsafe.{get,put}-X-Unaligned; Efficient array comparison intrinsics

Tue Feb 24 11:16:55 UTC 2015

Hi Andrew,

This looks like a good start.

On Feb 23, 2015, at 7:13 PM, Andrew Haley <aph at redhat.com> wrote:

> I've been kicking around a few ideas for Unsafe access methods for
> unaligned access to byte arrays and buffers in order to provide
> "whatever second-best mechanism the platform offers".  These would
> provide the base for fast lexicographic array comparisons, etc.
> 
> https://bugs.openjdk.java.net/browse/JDK-8044082
> 
> If the platform supports unaligned memory accesses, the implementation
> of {get,put}-X-Unaligned is obvious and trivial for both C1 and C2.
> It gets interesting when we want to provide efficient unaligned
> methods on machines with no hardware support.
> 
> We could provide compiler intrinsics which do when we need on such
> machines.  However, I think this wouldn't deliver the best results.
> From the experiments I've done, the best implementation is to write
> the access methods in Java and allow HotSpot to optimize them.  While
> this seemed a bit counter-intuitive to me, it's best because C2 has
> profile data that it can work on.  In many cases I suspect that data
> read and written from a byte array will be aligned for their type and
> C2 will take advantage of this, relegating the misaligned access to an
> out-of-line code path as appropriate.  

I am all for keeping more code in Java if we can. I don't know enough about assembler-based optimizations to determine if it might be possible to do better on certain CPU architectures.

One advantage, AFAIU, to intrinsics is they are not subject to the vagaries of inlining thresholds. It's important that the loops operating over the arrays to be compiled efficiently otherwise performance can drop off the cliff if thresholds are reached within the loop. Perhaps these methods are small enough it is not an issue? and also perhaps that is not a sufficient argument to justify the cost of an intrinsic (and we should be really tweaking the inlining mechanism)?

With that in mind is there any need to intrinsify the new methods at all given those new Java methods can defer to the older ones based on a constant check? Also should that anyway be done for the interpreter?

     private static final boolean IS_UNALIGNED = theUnsafe.unalignedAccess();

     public void putIntUnaligned(Object o, long offset, int x) {
         if (IS_UNALIGNED || (offset & 3) == 0) {
             putInt(o, offset, x);             
         } else if (byteOrder == BIG_ENDIAN) {
             putIntB(o, offset, x);
         } else {
             putIntL(o, offset, x);
         }
     }

I see you optimized the unaligned getLong by reading two aligned longs and then bit twiddled. It seems harder to optimize the putLong by straddling an aligned putInt with one to three required putByte.

> Also, these methods have the
> additional benefit that they are always atomic as long as the data are
> naturally aligned.
> 

We should probably document that in general access is not guaranteed to be atomic and an implementation detail that it currently is when naturally so.

> This does result in rather a lot of code for the methods for all sizes
> and endiannesses, but none of it will be used on machines with
> unaligned hardware support except in the interpreter.  (Perhaps the
> interpreter too could have intrinsics?)
> 
> I have changed HeapByteBuffer to use these methods, with a major
> performance improvement.  I've also provided Unsafe methods to query
> endianness and alignment support.
> 

If we expose the endianness query via a new method in unsafe we should reuse that in java.nio.Bits and get rid of the associated static code block.

Paul.

> Webrevs at http://cr.openjdk.java.net/~aph/unaligned.hotspot.1/
> http://cr.openjdk.java.net/~aph/unaligned.jdk.1/
> 
> Andrew.