Additional mismatch overloads for memory segments

Tue Jun 14 10:53:27 UTC 2022

> Back to your question, I sympathize with your use case: a static
MemorySegment.mismatch is no different than the static MemorySegment::copy
flavor we currently provide (the one that accepts two segments). And, even
for mismatch it might make sense to add overloads for arrays in dest/src
position (so now there's three methods).

Perhaps I emphasised it wrongly. It doesn't really matter whether it is
static or not, it's about additional overloads, currently there is no
overload that accepts offsets. An example of Arrays class is simply for
demonstration purposes (there is something similar in API already).

> As there would be no offset around (the offset would be embedded into the
segment). So, I feel that in order to answer some of the questions re.
bucketization, we (or at least I) need to understand why memory segment
slices are not good enough a carrier for use cases like yours.

That's true, but I have lots of objects (let's say millions per node), and
all of them are stored in multiple binary files (MemorySegment). That's why
I decided to go with my own slicing mechanism.
I'd be happy to make slices though, but I feel it'll create additional
pressure on GC for no reason. And the only reason I need an object in the
first place is equals/hashCode, otherwise I'd just pass slices.

>  While the above might not be perfect, I think there's an appealing
simplicity to it - and the explanation as to "why things are the way they
are" seems straightforward enough. There would be the minor inconvenience
of having to pass  a zero offset in some cases (which is already there for
dereference methods), but overall feels perhaps like a small price to pay.

>From my perspective the Java ecosystem has a "duality" in static and
instance methods. But probably this observation comes from Arrays and many
Apache *Utils libraries.
I think it's not really necessary to pick one, why shouldn't there be 2
ways, at least for the functions which make sense to do so (binary
operators).

вт, 14 июн. 2022 г. в 12:32, Maurizio Cimadamore <
maurizio.cimadamore at oracle.com>:

> Hi,
> Yes, this is the correcty way to provide feedback - thank you for asking.
> This reminds me of the discussion on copyFrom from some time ago.
>
> https://mail.openjdk.org/pipermail/panama-dev/2021-April/012872.html
>
> In that case, we felt that the burden of (a) creating an heap view of an
> array, then (b) slice source/dest segment accordingly and (c) translate
> array "logical" offsets into segment "physical" offsets was just too
> painful, so we provided static alternatives. In terms of performance it was
> not super easy to find cases where the static method was better (e.g.
> microbenchmarks would often reveal no difference). I believe most of the
> perceived "speedup" is transient, as eventually the slices _will_ be
> scalarized; but in the case of long running methods like bulk copy and
> mismatch, it takes _longer_ for the code to become hot enough so that
> scalarization would be applied, which created some tension. I recall a
> discussion along these lines with Uwe, who was trying to use memory
> segments in Lucene/Solr.
>
> Back to your question, I sympathize with your use case: a static
> MemorySegment.mismatch is no different than the static MemorySegment::copy
> flavor we currently provide (the one that accepts two segments). And, even
> for mismatch it might make sense to add overloads for arrays in dest/src
> position (so now there's three methods).
>
> That said, looking more broadly at the API, there are several instance
> methods which might be subject to the same treatment, basically nearly
> every instance method, except for the dereference methods (which already
> take an offset, more on that later) and toString/equals/hashcode. Consider
> the following examples that you might also have included in your email:
>
> ```
> val seg = mem.asSlice(offset, size);
> seg.fill((byte)42);
> seg.isLoaded();
> seg.force();
> seg.spliterator();
> seg.elements(JAVA_INT);
> seg.toArray(JAVA_INT);
> ... // the list goes on
> ```
>
> I think that copy (and now mismatch) were "picked on" the most because
> they are binary methods - they accept two segments. So the amount of
> slicing is doubled there, compared to the ones I showed above. But the
> issue remains: how friendly should the API be to clients that don't want
> (or can't) use slicing, but want to stick to one big segment and then
> passing an offset as a side-channel instead?
>
> DIfferent APIs deal with this problem in different ways. The ByteBuffer
> API, somewhat surprisingly, provides no static methods at all, although the
> bulk copy get/put do provide Java array overloads, so that you don't have
> to wrap/slice unnecessarily. Also, many methods in the Buffer API take an
> int offset which, again, helps reducing the amount of slicing going on
> (although some of these methods were added later[1]). Then there's arrays,
> which have relatively little info (just a length), and where all
> capabilities are exposed through a companion "side-car" class, namely
> j.u.Arrays.
>
> While I'm fine, in principle, with both approaches, I'm afraid to steer
> the API into a middle ground where it's a bit like ByteBuffer, a bit like
> Arrays. Or, if we go there, we should have a clear/defensible basis on how
> the bucketization works (which I realize we don't have right now). So, I
> encourage us not to have a discussion about "mismatch", but, rather, to
> have a broader discussion on how methods in the MemorySegment class should
> be "bucketized" (or we will be doomed to have this very same conversation
> again).
>
> Some addional points/observations: first, in your example, you seem to
> have a `mem` abstraction that stores a segment and then an offset. In other
> words, it seems like you have implemented your own slicing _on top_. In an
> alternate world where your API relied on slicing, your snippet would become
> just:
>
> ```
> return mem.mismatch(other.mem) == -1L
> ```
>
> As there would be no offset around (the offset would be embedded into the
> segment). So, I feel that in order to answer some of the questions re.
> bucketization, we (or at least I) need to understand why memory segment
> slices are not good enough a carrier for use cases like yours.
>
> Secondly, before we get too static-happy, let's also remember that
> sometimes dealing with instance methods is generally nice - e.g. I think
>
> ```
> segment.elements(JAVA_INT)
> ```
>
> looks/reads better than:
>
> ```
> MemorySegment.element(segment, JAVA_INT)
> ```
>
> And the more static calls you have, the more it shows. Ability of using
> method chaining in these cases is nice.
>
> Last, as hinted above, I note that our API has not two but *three*
> different ways to do things:
>
> 1. instance methods: like MemorySegment::fill
> 2. static methods: like MemorySegment::copy
> 3. instance methods that are offset-friendly: like MemorySegment::get
>
> The presence of (3) is, I think, notable, because it reflects a move that
> was also made by the ByteBuffer API. That give us some hints about *one*
> possible way to bucketize the memory segment functionalities:
>
> * for unary methods (e.g. methods that work on a single segment), use (3)
> - that is use an _instance_ method which takes an offset (so that it makes
> the method more applicable w/o the need to slice)
> * for binary methods, since there's two offsets, and the order of
> arguments is always hard to get right, just use static methods; this would
> apply to both copy and mismatch
>
> While the above might not be perfect, I think there's an appealing
> simplicity to it - and the explanation as to "why things are the way they
> are" seems straightforward enough. There would be the minor inconvenience
> of having to pass  a zero offset in some cases (which is already there for
> dereference methods), but overall feels perhaps like a small price to pay.
>
> Thoughts?
>
> Maurizio
>
> [1] - https://bugs.openjdk.org/browse/JDK-8212619
>
>
> On 14/06/2022 00:42, Alexander Biryukov wrote:
>
> There is a method MemorySegment#mismatch, but it's kind of inefficient in
> the following situation:
>
> val seg1 = mem.asSlice(offset, size)
> val seg2 = other.mem.asSlice(other.offset, other.size)
> return seg1.mismatch(seg2) == -1L
>
> There are additional mismatch methods in java.util.Arrays for arrays, like
>
> public static int mismatch(byte[] a, int aFromIndex, int aToIndex,
>                            byte[] b, int bFromIndex, int bToIndex)
>
> It would be nice to have something like that for MemorySegment as well.
> Or even better -- MemorySegment#compare :)
>
> NB: is this the correct way to provide feedback?
>
> Best regards,
> Alexander Biryukov
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20220614/10410d9a/attachment.htm>