Unsafe: efficiently comparing two byte arrays
Paul Sandoz
paul.sandoz at oracle.com
Thu Mar 13 14:19:34 UTC 2014
On Mar 13, 2014, at 1:57 PM, Andrew Haley <aph at redhat.com> wrote:
> On 03/13/2014 12:49 PM, Paul Sandoz wrote:
>> On Mar 13, 2014, at 1:19 PM, Andrew Haley <aph at redhat.com> wrote:
>>> On 03/13/2014 11:57 AM, Paul Sandoz wrote:
>>>> Now i am in two minds to whether to add ByteBuffer.compareUnsigned or add Arrays.compareUnsigned.
>>>>
>>>> An explosion of methods on Arrays for all types (plus to/from versions) would be annoying.
>>>
>>> Surely it's a no brainer? All we have to do is add the intrinsics for
>>> ByteBuffers, then we get fast ByteBuffers and fast array comparisons too.
>>> I don't see the problem.
>>
>> For byte[] comparison, I don't think optimal long access is
>> sufficient on it's own due to alignment issues. It would be nice if
>> we can avoid burdening users with alignment issues to ensure the
>> main comparison loop is efficient.
>
> But alignment issues are a feature of the hardware, not of software.
> The problem right now is that ByteBuffers use a byte-by-byte
> comparison of longs, even when it isn't needed, because there aren't
> the intrinsics.
>
I was considering the following case:
byte[] b1 = ...
byte[] b2 = ...
// compare from offset
ByteBuffer left = ByteBuffer.wrap(b1, 1, b1.length - 1);
ByteBuffer right = ByteBuffer.wrap(b2, 1, b2.length - 1);
int r = compare(left, right);
static int compare(ByteBuffer left, ByteBuffer right) {
int minLength = Math.min(left.remaining(), right.remaining());
int minWords = minLength / 8;
for (int i = 0; i < minWords * 8; i += 8) {
long lw = left.getLong(); // Inefficient when unaligned access is not supported by hardware?
long rw = right.getLong(); // Are bounds checks hoisted out? i presume so
if (lw != rw) {
return Long.compareUnsigned(lw, rw);
}
}
for (int i = minWords * 8; i < minLength; i++) {
int result = Byte.toUnsignedInt(left.get()) - Byte.toUnsignedInt(right.get());
if (result != 0) {
return result;
}
}
return left.remaining() - right.remaining();
}
Certain Apache code (e.g. see Cassandra) supports comparing byte[] arrays with offsets and i am not sure that code is portable [1].
I was wondering in the case where the hardware does not support unaligned access it might be possible to sill optimize by reading longs at aligned positions and doing some some extra bit twiddling.
>> A quick solution is to leverage Unsafe within a
>> ByteBuffer.compareUnsigned method. In fact i believe Unsafe could be
>> used (as Aleksey says in [1]) for put/get as well, which i presume
>> is likely to be much easier/quicker to implement. Might be a good
>> first step until a more superior intrinsics solution is implemented?
>
> I still don't get it, sorry. What can Unsafe do that ByteBuffer
> intrinsics can't do?
>
We can arguably implement it faster [2] and i presume it will be simpler too.
Paul.
[1] https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob;f=src/java/org/apache/cassandra/utils/FastByteComparisons.java;h=4be6cd42b3035055c2b6102718d2cf49b73669f1;hb=HEAD#l184
[2] Well, i argue it is since i know next to nothing about how to implement an intrinsic :-)
More information about the core-libs-dev
mailing list