Unsafe: efficiently comparing two byte arrays

Wed Feb 26 17:21:40 UTC 2014

Every once in  a while I see an attempt to introduce a "general purpose"
unsafe array class, but efforts stall due to:
- it's not obvious whether unaligned access to e.g. 8 bytes via long is
even possible or more efficient than just reading 8 bytes
- endianness is an issue
- for best performance, you also want to elide those pesky array bound
checks (but hotspot can do a better job of that)

I think it's worth doing, but it's harder than it looks, and probably needs
help from the VM via intrinsics.

---

lexicographicalComparator saves a lot of cycles.

On Wed, Feb 26, 2014 at 7:42 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:

> Hi,
>
> A common reason why Unsafe is used is to more efficiently compare two
> unsigned byte arrays, viewing those byte arrays as long arrays. See Guava
> [1] and a number of apache frameworks for similar code.
>
> One solution is to provide such functionality in Arrays for all primitives
> and probably refs [2]:
>
>   int Arrays.compare(byte[], byte[]);
>
> Then it is easy to create a comparator using a method reference:
>
>   Comparator<byte[]> c = Arrays::compare;
>
> There could, initially, be Java implementations for those methods,
> including using Unsafe for byte[]. I gather those methods could be
> intrinsified to implementations using SIMD instructions on supported
> platforms. I don't know if that is possible today with Hotspot, but
> regardless i think a good start would be to have Java-based implementations
> in place.
>
> Thoughts?
>
> Paul.
>
> [1]
> http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/primitives/UnsignedBytes.html#lexicographicalComparator()
>
> [2] https://bugs.openjdk.java.net/browse/JDK-8033148
>
>