Additional mismatch overloads for memory segments
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Tue Jun 14 09:32:30 UTC 2022
Hi,
Yes, this is the correcty way to provide feedback - thank you for
asking. This reminds me of the discussion on copyFrom from some time ago.
https://mail.openjdk.org/pipermail/panama-dev/2021-April/012872.html
In that case, we felt that the burden of (a) creating an heap view of an
array, then (b) slice source/dest segment accordingly and (c) translate
array "logical" offsets into segment "physical" offsets was just too
painful, so we provided static alternatives. In terms of performance it
was not super easy to find cases where the static method was better
(e.g. microbenchmarks would often reveal no difference). I believe most
of the perceived "speedup" is transient, as eventually the slices _will_
be scalarized; but in the case of long running methods like bulk copy
and mismatch, it takes _longer_ for the code to become hot enough so
that scalarization would be applied, which created some tension. I
recall a discussion along these lines with Uwe, who was trying to use
memory segments in Lucene/Solr.
Back to your question, I sympathize with your use case: a static
MemorySegment.mismatch is no different than the static
MemorySegment::copy flavor we currently provide (the one that accepts
two segments). And, even for mismatch it might make sense to add
overloads for arrays in dest/src position (so now there's three methods).
That said, looking more broadly at the API, there are several instance
methods which might be subject to the same treatment, basically nearly
every instance method, except for the dereference methods (which already
take an offset, more on that later) and toString/equals/hashcode.
Consider the following examples that you might also have included in
your email:
```
val seg = mem.asSlice(offset, size);
seg.fill((byte)42);
seg.isLoaded();
seg.force();
seg.spliterator();
seg.elements(JAVA_INT);
seg.toArray(JAVA_INT);
... // the list goes on
```
I think that copy (and now mismatch) were "picked on" the most because
they are binary methods - they accept two segments. So the amount of
slicing is doubled there, compared to the ones I showed above. But the
issue remains: how friendly should the API be to clients that don't want
(or can't) use slicing, but want to stick to one big segment and then
passing an offset as a side-channel instead?
DIfferent APIs deal with this problem in different ways. The ByteBuffer
API, somewhat surprisingly, provides no static methods at all, although
the bulk copy get/put do provide Java array overloads, so that you don't
have to wrap/slice unnecessarily. Also, many methods in the Buffer API
take an int offset which, again, helps reducing the amount of slicing
going on (although some of these methods were added later[1]). Then
there's arrays, which have relatively little info (just a length), and
where all capabilities are exposed through a companion "side-car" class,
namely j.u.Arrays.
While I'm fine, in principle, with both approaches, I'm afraid to steer
the API into a middle ground where it's a bit like ByteBuffer, a bit
like Arrays. Or, if we go there, we should have a clear/defensible basis
on how the bucketization works (which I realize we don't have right
now). So, I encourage us not to have a discussion about "mismatch", but,
rather, to have a broader discussion on how methods in the MemorySegment
class should be "bucketized" (or we will be doomed to have this very
same conversation again).
Some addional points/observations: first, in your example, you seem to
have a `mem` abstraction that stores a segment and then an offset. In
other words, it seems like you have implemented your own slicing _on
top_. In an alternate world where your API relied on slicing, your
snippet would become just:
```
return mem.mismatch(other.mem) == -1L
```
As there would be no offset around (the offset would be embedded into
the segment). So, I feel that in order to answer some of the questions
re. bucketization, we (or at least I) need to understand why memory
segment slices are not good enough a carrier for use cases like yours.
Secondly, before we get too static-happy, let's also remember that
sometimes dealing with instance methods is generally nice - e.g. I think
```
segment.elements(JAVA_INT)
```
looks/reads better than:
```
MemorySegment.element(segment, JAVA_INT)
```
And the more static calls you have, the more it shows. Ability of using
method chaining in these cases is nice.
Last, as hinted above, I note that our API has not two but *three*
different ways to do things:
1. instance methods: like MemorySegment::fill
2. static methods: like MemorySegment::copy
3. instance methods that are offset-friendly: like MemorySegment::get
The presence of (3) is, I think, notable, because it reflects a move
that was also made by the ByteBuffer API. That give us some hints about
*one* possible way to bucketize the memory segment functionalities:
* for unary methods (e.g. methods that work on a single segment), use
(3) - that is use an _instance_ method which takes an offset (so that it
makes the method more applicable w/o the need to slice)
* for binary methods, since there's two offsets, and the order of
arguments is always hard to get right, just use static methods; this
would apply to both copy and mismatch
While the above might not be perfect, I think there's an appealing
simplicity to it - and the explanation as to "why things are the way
they are" seems straightforward enough. There would be the minor
inconvenience of having to pass a zero offset in some cases (which is
already there for dereference methods), but overall feels perhaps like a
small price to pay.
Thoughts?
Maurizio
[1] - https://bugs.openjdk.org/browse/JDK-8212619
On 14/06/2022 00:42, Alexander Biryukov wrote:
> There is a method MemorySegment#mismatch, but it's kind of inefficient
> in the following situation:
>
> val seg1 = mem.asSlice(offset, size)
> val seg2 = other.mem.asSlice(other.offset, other.size)
> return seg1.mismatch(seg2) == -1L
>
> There are additional mismatch methods in java.util.Arrays for arrays, like
>
> public static int mismatch(byte[] a, int aFromIndex, int aToIndex,
> byte[] b, int bFromIndex, int bToIndex)
>
> It would be nice to have something like that for MemorySegment as well.
> Or even better -- MemorySegment#compare :)
>
> NB: is this the correct way to provide feedback?
>
> Best regards,
> Alexander Biryukov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20220614/d5187c90/attachment-0001.htm>
More information about the panama-dev
mailing list