Lexicographic array comparators
Hi, One common use of Unsafe is boost the performance of comparing unsigned byte[] by viewing as long[] (for example as performed by Guava or Cassandra). To reduce the dependency on Unsafe we need a public and safe equivalent that works on all platforms while meeting the performance expectations. In the past we have discussed exposing something on ByteBuffer but i think the right thing to explore are comparators for all basic types on java.util.Arrays: https://bugs.openjdk.java.net/browse/JDK-8033148 This should be explored with the following issue in mind to expose an intrinsic that could be leveraged when implemented: https://bugs.openjdk.java.net/browse/JDK-8044082 Some preliminary work can be found here: http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8033148-Arrays-lexico-compare/w... The "explosion" in methods is annoying. Out of the 8 primitives 4 can be signed or unsigned. Objects can be comparable or an explicit comparator can be used. There are implicit and explicit ranges. That makes a maximum of 28 methods, but could be reduced to a minimum of 10, for all basic types. Comparator instances, for implicit ranges, can easily be created from method refs or lambdas. Thoughts? Paul.
People will continue to want to access byte arrays (and direct byte buffers) with C-like performance, and are currently using Unsafe to do so. Hard to fix for real. Endianness and unaligned access are both non-portable aspects. People don't want to pay for bounds checking and especially not for alignment checking of indexes known to be aligned. Lexicographic comparison is only one use case (that happened to be important performance wise for guava). I have long thought that arrays should acquire more listesque methods (e.g. contains, indexOf), even though arrays are unpopular. On Tue, Feb 10, 2015 at 6:36 AM, Paul Sandoz <paul.sandoz@oracle.com> wrote:
Hi,
One common use of Unsafe is boost the performance of comparing unsigned byte[] by viewing as long[] (for example as performed by Guava or Cassandra). To reduce the dependency on Unsafe we need a public and safe equivalent that works on all platforms while meeting the performance expectations.
In the past we have discussed exposing something on ByteBuffer but i think the right thing to explore are comparators for all basic types on java.util.Arrays:
https://bugs.openjdk.java.net/browse/JDK-8033148
This should be explored with the following issue in mind to expose an intrinsic that could be leveraged when implemented:
https://bugs.openjdk.java.net/browse/JDK-8044082
Some preliminary work can be found here:
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8033148-Arrays-lexico-compare/w...
The "explosion" in methods is annoying. Out of the 8 primitives 4 can be signed or unsigned. Objects can be comparable or an explicit comparator can be used. There are implicit and explicit ranges. That makes a maximum of 28 methods, but could be reduced to a minimum of 10, for all basic types.
Comparator instances, for implicit ranges, can easily be created from method refs or lambdas.
Thoughts?
Paul.
Hi Martin, In this case i am trying to pick off one particularly common case, within the 9 time-frame, used in a number of popular libraries. In that context (including that of the intrinsic referenced in the related issue) do you think this is reasonable? On Feb 10, 2015, at 8:48 PM, Martin Buchholz <martinrb@google.com> wrote:
People will continue to want to access byte arrays (and direct byte buffers) with C-like performance, and are currently using Unsafe to do so. Hard to fix for real.
Yes, that is a much larger problem that i hope both value types and panama will address more fully.
Endianness and unaligned access are both non-portable aspects. People don't want to pay for bounds checking and especially not for alignment checking of indexes known to be aligned.
Note that as part of the VarHandles work we will try and improve the strength reduction of bounds checks for array access. If we expose an intrinsic for unsigned integer comparison that can be reused within the nio buffers.
Lexicographic comparison is only one use case (that happened to be important performance wise for guava). I have long thought that arrays should acquire more listesque methods (e.g. contains, indexOf), even though arrays are unpopular.
Yes, i was wondering the same, there should be an interface. But i think that is an arrays 2.0 and value types thing. Paul.
On 02/10/2015 07:48 PM, Martin Buchholz wrote:
People will continue to want to access byte arrays (and direct byte buffers) with C-like performance, and are currently using Unsafe to do so. Hard to fix for real. Endianness and unaligned access are both non-portable aspects. People don't want to pay for bounds checking and especially not for alignment checking of indexes known to be aligned. Lexicographic comparison is only one use case (that happened to be important performance wise for guava).
I've been looking at the code HotSpot generates for DirectByteBuffer.getXXX() and it's actually pretty good. In many cases C2 optimizes away the endianness, alignment, and bounds checks, and nicely inlines and unrolls everything. Sure, HotSpot can't always remove bounds checks, but that's a fairly small price to pay, IMO. getXXX() methods in HeapByteBuffers, however, aren't intrinsified, so access to data items larger than bytes is byte-at-a-time and pretty horrible. It wouldn't be a huge job to fix this by adding a few Unsafe.getUnalignedX intrinsics, and then I think we'd have C-like performance. Whether it makes sense to do this rather than add a new API with a bunch of array methods to do the accesses directly is hard to say. Doing it in HeapByteBuffer has the advantage of reqiring no API changes. We could get it done in the JDK9 timeframe. Andrew.
On Feb 13, 2015, at 2:16 PM, Andrew Haley <aph@redhat.com> wrote:
On 02/10/2015 07:48 PM, Martin Buchholz wrote:
People will continue to want to access byte arrays (and direct byte buffers) with C-like performance, and are currently using Unsafe to do so. Hard to fix for real. Endianness and unaligned access are both non-portable aspects. People don't want to pay for bounds checking and especially not for alignment checking of indexes known to be aligned. Lexicographic comparison is only one use case (that happened to be important performance wise for guava).
I've been looking at the code HotSpot generates for DirectByteBuffer.getXXX() and it's actually pretty good. In many cases C2 optimizes away the endianness, alignment, and bounds checks, and nicely inlines and unrolls everything. Sure, HotSpot can't always remove bounds checks, but that's a fairly small price to pay, IMO.
getXXX() methods in HeapByteBuffers, however, aren't intrinsified, so access to data items larger than bytes is byte-at-a-time and pretty horrible. It wouldn't be a huge job to fix this by adding a few Unsafe.getUnalignedX intrinsics, and then I think we'd have C-like performance. Whether it makes sense to do this rather than add a new API with a bunch of array methods to do the accesses directly is hard to say. Doing it in HeapByteBuffer has the advantage of reqiring no API changes. We could get it done in the JDK9 timeframe.
John suggested that a general array mis-match method could be achieved with carefully written Java code [1], and slightly hidden within that is the comment "FIXME: Add Unsafe.getLongMisaligned to avoid this cutout". I am inclined to keep pushing on this array mis-match functionality and it's usages for existing and new stuff in Arrays as well as trying to leverage component pieces in other areas like you suggest. Paul. [1] https://bugs.openjdk.java.net/browse/JDK-8044082?focusedCommentId=13608657&p...
On 02/13/2015 01:56 PM, Paul Sandoz wrote:
John suggested that a general array mis-match method could be achieved with carefully written Java code [1], and slightly hidden within that is the comment "FIXME: Add Unsafe.getLongMisaligned to avoid this cutout".
Right, so we're definitely thinking along the same lines. Maybe I could do the HotSpot work for Unsafe.getUnalignedX. Andrew.
On Feb 13, 2015, at 3:31 PM, Andrew Haley <aph@redhat.com> wrote:
On 02/13/2015 01:56 PM, Paul Sandoz wrote:
John suggested that a general array mis-match method could be achieved with carefully written Java code [1], and slightly hidden within that is the comment "FIXME: Add Unsafe.getLongMisaligned to avoid this cutout".
Right, so we're definitely thinking along the same lines. Maybe I could do the HotSpot work for Unsafe.getUnalignedX.
Got for it! I would be happy to give it a test drive. Paul.
On 02/13/2015 02:58 PM, Paul Sandoz wrote:
On Feb 13, 2015, at 3:31 PM, Andrew Haley <aph@redhat.com> wrote:
On 02/13/2015 01:56 PM, Paul Sandoz wrote:
John suggested that a general array mis-match method could be achieved with carefully written Java code [1], and slightly hidden within that is the comment "FIXME: Add Unsafe.getLongMisaligned to avoid this cutout".
Right, so we're definitely thinking along the same lines. Maybe I could do the HotSpot work for Unsafe.getUnalignedX.
Got for it!
I would be happy to give it a test drive.
I have a patch that seems to work, but it's still rather raw. It'll need a lot of testing, I guess. Do you want to try it now? Andrew.
The addition of array comparison intrinsics makes sense - I've missed them myself on occasion. I was surprised that System.arraycopy hasn't been specialized for every array type. I guess the JIT is good at specializing all callers. I don't know whether we will someday regret the explosion of array comparison methods. The case for the intrinsic seems more compelling to me. (Hopefully someone else will do the real review) On Tue, Feb 10, 2015 at 6:36 AM, Paul Sandoz <paul.sandoz@oracle.com> wrote:
Hi,
One common use of Unsafe is boost the performance of comparing unsigned byte[] by viewing as long[] (for example as performed by Guava or Cassandra). To reduce the dependency on Unsafe we need a public and safe equivalent that works on all platforms while meeting the performance expectations.
In the past we have discussed exposing something on ByteBuffer but i think the right thing to explore are comparators for all basic types on java.util.Arrays:
https://bugs.openjdk.java.net/browse/JDK-8033148
This should be explored with the following issue in mind to expose an intrinsic that could be leveraged when implemented:
https://bugs.openjdk.java.net/browse/JDK-8044082
Some preliminary work can be found here:
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8033148-Arrays-lexico-compare/w...
The "explosion" in methods is annoying. Out of the 8 primitives 4 can be signed or unsigned. Objects can be comparable or an explicit comparator can be used. There are implicit and explicit ranges. That makes a maximum of 28 methods, but could be reduced to a minimum of 10, for all basic types.
Comparator instances, for implicit ranges, can easily be created from method refs or lambdas.
Thoughts?
Paul.
On Feb 10, 2015, at 3:15 PM, Martin Buchholz <martinrb@google.com> wrote:
I was surprised that System.arraycopy hasn't been specialized for every array type. I guess the JIT is good at specializing all callers.
Yes. Usually the arguments have specific static types the JIT can see. The Object formal type doesn't obscure this.
I don't know whether we will someday regret the explosion of array comparison methods.
It's technical debt. When we go to value types there will be an infinite number of array types. That's why in Project Valhalla we are thinking hard, before that, about parametric polymorphism. We need to be able to have true polymorphism over APIs in T[], where T can be ref, prim, or value. At that point, hopefully, what we have won't be impossible to retrofit. — John
On Feb 11, 2015, at 12:26 AM, John Rose <john.r.rose@oracle.com> wrote:
On Feb 10, 2015, at 3:15 PM, Martin Buchholz <martinrb@google.com> wrote:
I was surprised that System.arraycopy hasn't been specialized for every array type. I guess the JIT is good at specializing all callers.
Yes. Usually the arguments have specific static types the JIT can see. The Object formal type doesn't obscure this.
I don't know whether we will someday regret the explosion of array comparison methods.
It's technical debt. When we go to value types there will be an infinite number of array types. That's why in Project Valhalla we are thinking hard, before that, about parametric polymorphism. We need to be able to have true polymorphism over APIs in T[], where T can be ref, prim, or value. At that point, hopefully, what we have won't be impossible to retrofit.
Yes, i am hoping that many of the overloaded methods on Arrays with the same code shape can be "anyfied" and reduced in a source and binary compatible way. In that respect i feel a less guilty about proposing so many methods :-) In this case we have the interesting interaction with Comparable: public static <any T extends Comparable<? super T>> int compare(T[] left, T[] right) We certainly don't want to box for T == int when making comparisons between two ints. I presume, in this case, the intrinsic array match method will deal with that aspect. -- An alternative solution is to leave Arrays alone and just to go with the lower level System.arrayMismatch method proposed in JDK-8044082 and then wait for value types. However, it requires significantly more work to write an intrinsic, which may put it at risk for the 9 time frame. Paul.
On Feb 11, 2015, at 12:15 AM, Martin Buchholz <martinrb@google.com> wrote:
The addition of array comparison intrinsics makes sense - I've missed them myself on occasion.
Thanks.
I was surprised that System.arraycopy hasn't been specialized for every array type. I guess the JIT is good at specializing all callers.
I don't know whether we will someday regret the explosion of array comparison methods. The case for the intrinsic seems more compelling to me.
(Hopefully someone else will do the real review)
No problem. It's too early for a proper webrev review. For the moment i am seeking comments on the general direction. Thanks, Paul.
participants (4)
-
Andrew Haley
-
John Rose
-
Martin Buchholz
-
Paul Sandoz