[vector-api] interoperability loading from non-byte[] heap MemorySegments

Mon Oct 23 17:17:32 UTC 2023

Logged issue:

  https://bugs.openjdk.org/browse/JDK-8318678

Paul.

> On Oct 23, 2023, at 9:46 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:
> 
> Some additional context. These methods were added in JDK 19 when the FFM alignment restrictions were different, and it was not possible to generally scribble on a heap array. With the current FFM alignment restrictions we can update these methods to be less restrictive, and we should do so for 22 (I wanted to do it earlier but have been too busy)
> 
> I don’t think we require a layout, the species + byte order can server as a proxy, where the species element corresponds to the unaligned primitive layout with the appropriate byte order.
> 
> Paul.
> 
> 
>> On Oct 23, 2023, at 4:52 AM, Chris Hegarty <chegar999 at gmail.com> wrote:
>> 
>> Hi Maurizio,
>> Thanks for the quick reply.
>> On 23/10/2023 12:34, Maurizio Cimadamore wrote:
>>> Hi Chris,
>>> this is a good question. The main reason as to why the check is there is this: the code this new segment-based method replaces is code that used to accept a ByteBuffer. The check you see there was added just to make sure we didn't run into new and unexpected situations.
>> Ah yes. That's kinda what I thought. 
>>> Moving forward I could see this relaxed, either by tweaking the API to accept an explicit layout from the user (so that the API can check if the user really wants unaligned floats). Or, by doing a different check which validates the maximum supported alignment against the vector species. E.g. I suppose loading an int[] into a DoubleVector is not ok - but float[]/int[]/short[]/char[]/byte[] in FloatVector should be ok. But, going down that path is messy: while heap segments keep track of their underlying Java array, which can then be used to perform validation, an off-heap segment doesn't have any particular alignment constraint. So (I think) we're back to the user providing a layout parameter to the load method.
>> Yeah, I think an explicit layout would be the way to go here. And maybe a default of 4-byte aligned, JAVA_FLOAT, when not explicitly passed for heap backed segments. The default should be relatively straightforward to enable, behaving as if JAVA_FLOAT with the passed byte order.
>> -Chris.
>>> Maurizio
>>> 
>>> 
>>> 
>>> 
>>> On 23/10/2023 12:10, Chris Hegarty wrote:
>>>> Hi,
>>>> I'm curious about the restriction when loading from heap backed memory segments. E.g.
>>>> * @throws IllegalArgumentException if the memory segment is a heap segment that is
>>>> * not backed by a {@code byte[]} array.
>>>> * ...
>>>> FloatVector fromMemorySegment(VectorSpecies<Float> species, MemorgSegment ms, ...)
>>>> Which results in:
>>>> jshell> float[] arr = new float[] { 1, 2, 3, 4, 5, 6, 7, 8 }
>>>> arr ==> float[8] { 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0 }
>>>> 
>>>> jshell> var vec = FloatVector.fromMemorySegment(FloatVector.SPECIES_PREFERRED, MemorySegment.ofArray(arr), 0, ByteOrder.nativeOrder())
>>>> |  Exception java.lang.IllegalArgumentException
>>>> |        at ScopedMemoryAccess.loadFromMemorySegment (ScopedMemoryAccess.java:334)
>>>> |        at FloatVector.fromMemorySegment0Template (FloatVector.java:3353)
>>>> |        at Float128Vector.fromMemorySegment0 (Float128Vector.java:864)
>>>> |        at FloatVector.fromMemorySegment (FloatVector.java:2986)
>>>> |        at do_it$Aux (#43:1)
>>>> |        at (#43:1)
>>>> 
>>>> I can see that this is deliberate ...
>>>> V loadFromMemorySegmentMasked(...) {
>>>> // @@@ Smarter alignment checking if accessing heap segment backing non-byte[] array
>>>> if (msp.maxAlignMask() > 1) {
>>>> throw new IllegalArgumentException();
>>>> }
>>>> Is this just temporary? I would expect the alignment to be as if `ValueLayout.JAVA_FLOAT.withByteAlignment(1)`, no? Is the intent to eventually support other non-byte primitive array typed, and to require and check alignment? It should just work, right?
>>>> --
>>>> The reason I ask about this is that over in Luceneland we're considering a switch to loading vector data from float[] to MemorySegment - which allows to load search vectors directly from the mmapped index file. But we still have some code paths which have float[], which may or may not be coming from the mmapped file. So we end up with something like this:
>>>> dotProduct(float[] a, float[] b)
>>>> dotProduct(float[] a, MemorySegment b)
>>>> dotProduct(MemorySegment a, MemorySegment b)
>>>> ... and we have cosine and Euclidean distance too. We can of course write the three variants of the code (and we've done this), just that it would be desirable to have the float[] accepting methods just wrap and delegate to the MemorySegment variant. In our use case, we don't slice or offset into the heap segment, but I do see how things could get misaligned quite quickly, but again I expect this to behave as if with byte alignment(1).
>>>> Thanks,
>>>> -Chris.
>