Additional mismatch overloads for memory segments
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Fri Jun 17 13:02:56 UTC 2022
On 17/06/2022 13:49, Alexander Biryukov wrote:
>
> 1. Add a static mismatch method(s) (similar to what we did for copy)
> 2. add the following overloads:
> * fill(byte, long, long) // D
> 3. Tweak the indexed accessors - e.g. rename them to just "get"
> and get
> both an offset (in bytes) parameter as well as an index (in elements)
> parameter; then decide what to do with the offset-only dereference
> variant:
> 3a) keep as is
> 3b) drop
> 3c) tweak it so that parameter is an index, not an offset
>
>
> 1. Agree, basic functionality like copy/mismatch must expose an offset
> 2. Can't see harm here, so I'd add an overload
> 3. I think the current get/set methods with offsets are fine as is (3a).
> I'm not sure if merging them together with setAtIndex is a good idea,
> something like set(JAVA_INT, 0, 0, 123) looks confusing.
> I expect that "index" means that it starts with 0.
> Otherwise I'd just use set/get and do offset + i * 2/4/8, it's much
> simpler.
Point taken. Note that option 3c though would give you the same e.g.
set(JAVA_INT, 0, 123) // sets 123 at index 0
and, also have an advanced option:
set(JAVA_INT, offset, 0, 123) // sets 123 at index 0, starting at offset
Doing I *2/4/8 is, of course possible, but looks redundant, given that
you are also passing a layout.
That said, I don't feel particularly strongly about any of the options
here - and I agree that the API points provided now are expressive
enough (assuming one wants to do offset computation by hand).
> By the way, about getUtf8String(long, long).There are multiple ways to
> put UTF-8 strings in byte[],
> like null-termination or write length as the first
> byte/int/varint/stored somewhere else (schema).
> Current implementation is a null-terminated string, which is fine for
> C interop.
> Perhaps naming could be better to reflect it.
True. Best we could do is probably dropping a C somewhere in there.
(e.g. getUtf8CString), as I don't think "getUtf8NullTerminatedString"
would fly :-)
Maurizio
>
>
>
> чт, 16 июн. 2022 г. в 17:01, Maurizio Cimadamore
> <maurizio.cimadamore at oracle.com>:
>
> Hi,
> I've spent some time classifying the various instance methods in
> MemorySegment, to see if some pattern emerged:
>
> (A) streaming ops:
>
> * elements(MemoryLayout)
> * spliterator(MemoryLayout)
>
> (B) slicing ops
>
> * asSlice(long)
> * asSlice(long, long)
>
> (C) containment ops
>
> * asOverlappingSlice(MemorySegment)
> * segmentOffset(MemorySegment)
>
> (D) bulk ops
>
> * fill(byte)
> * copyFrom(MemorySegment)
> * mismatch(MemorySegment)
>
> (E) mapped segment ops
>
> * isLoaded()
> * load()
> * unload()
> * force()
>
> (F) transform ops
>
> * asByteBuffer()
> * toArray(ValueLayout.OfT)
>
> (G) simple dereference ops
>
> * get(ValueLayout.OfT, long)
> * set(ValueLayout.OfT, long, T)
>
> (H) indexed dereference ops
>
> getAtIndex(ValueLayout.OfT, long)
> setAtIndex(ValueLayout.OfT, long, T)
>
> (I) string dereferene ops
>
> getUtf8String(long)
> setUtf8String(long, String)
>
> --------------------------------------
>
> Now, let's see if, case by case, we can see whether some additional
> overload would be desirable or not. As Jorn pointed out, it's not
> just
> about having an offset overload, in some cases offet+size might be
> better.
>
> First, I think (B), (C) and (G) are ok as is. (B) and (G) already
> provides offset/size accepting variants, and (C) is more about
> querying
> whether two segments are related, so adding offsets here seems odd.
>
> For (A), (D), (E), (F) you could argue that having an overload that
> accepts a size and an offset (or two telescopic overloads, one
> with only
> offset, another with offset _and_ size), might make sense. Zooming
> in a
> bit more, I think some further distinctions are in order.
>
> First, copy already provides many static methods in the style of
> System.arrayCopy. So I think it's fair to say that anybody who wants
> more control, should use the static method instead. If we add a
> static
> mismatch, I think the same argument might apply there.
>
> Similarly, toArray is a nice convenience method, but it's really
> just a
> shortcut for a bulk copy - again, the static methods are enough for
> users who want more control.
>
> Which leaves us with:
>
> * elements(MemoryLayout) // A
> * spliterator(MemoryLayout) // A
> * fill(byte) // D
> * isLoaded() // E
> * load() // E
> * unload() // E
> * force() // E
> * asByteBuffer // F
>
> Adding one or two overloads for both is doable, but feels like
> overkill
> API-wise. I also note that MappedByteBuffer does not provide
> overloads
> for isLoaded/load/unload. Only force() has an overload, that was
> added
> in 13 [1], and, I believe, because ByteBuffer::force relies on the
> buffer mutable position (see last section in [2]), thus making it
> impossible to flush independent mapped regions in parallel using the
> same buffer. MemorySegments do not have that issue, given that
> they are
> immutable - so one can create 20 disjoint slices of a mapped
> segment and
> force() them in parallel. And, some of these methods allocate
> several
> objects anyway (esp. elements and spliterator), so I don't see a
> lot of
> reasons as to why slicing shouldn't just work there.
>
> So I believe we could probably just add the following (although I
> realize this is an arbitrary cut):
>
> * fill(byte, long, long) // D
>
> As for (H), I think the main problem here is that dereference
> methods in
> (H) are not built on top of those in (G). I think the following form:
>
> getAtIndex(ValueLayout.OfT, long, long)
> setAtIndex(ValueLayout.OfT, long, long, T)
>
> Would be more useful, where the first parameter is an offset into the
> segment, and the second is a logical index. Seems strictly more
> general.
> Actually note that these new methods are general enough that, in
> principle, we could only have these, and be done (e.g. just one
> set of
> accessors instead of two). Or, instead of having two sets, for offset
> and offset+index, we could have two sets for index and offset+index -
> which seems more consistent with the rest of the instance methods
> (where
> the offset variant is the "advanced" one).
>
> Finally, I think (I) the the most complex of all. Because, if we add
> something like:
>
> getUtf8String(long, long)
>
> At least to me it's not entirely clear what the "length" parameter
> does.
> Does it means (like in all the other methods discussed so far)
> that it's
> like doing asSlice(offset, length) and then getUtf8String? Or does it
> mean (as Jorn seemed to imply) that the length parameter refers to
> the
> string length? Also, what if user specifies a string length, but the
> memory region contains a terminator char _before_ the user-specified
> length? Honestly, it is not clear to me that what we do in (I) is
> clearly lacking, or could be improved in a straightforward way. If
> user
> just wants to view a slice of memory as a string, and ignore
> terminators, they can just use bulk copy, to copy the contents of the
> segment into a byte array, and then create the string from there
> (which
> is what our impl does anyway) - it's all bytes at the end of the
> day -
> IMHO the interesting thing about the string dereference method
> *is* the
> terminator handling. W/o that it's just a copy.
>
> So, summarizing, to make the API more offset/length friendly we could:
>
> 1. Add a static mismatch method(s) (similar to what we did for copy)
> 2. add the following overloads:
> * fill(byte, long, long) // D
> 3. Tweak the indexed accessors - e.g. rename them to just "get"
> and get
> both an offset (in bytes) parameter as well as an index (in elements)
> parameter; then decide what to do with the offset-only dereference
> variant:
> 3a) keep as is
> 3b) drop
> 3c) tweak it so that parameter is an index, not an offset
>
> In this plan, the most controversial part is (2) - in the sense
> that it
> involves an arbitrary choice based on the uses we've seen on the
> ByteBuffer API, as well as some other considerations (e.g. how
> much do
> the methods end up allocating anyway). As for (3), I think the more
> general offset+index dereference methods, give us an opportunity to
> think if the set of dereference we provide is the one that makes the
> most sense.
>
> Thoughts?
>
> Maurizio
>
> [1] - https://bugs.openjdk.org/browse/JDK-8221696
> [2] - https://openjdk.org/jeps/352
>
> On 15/06/2022 14:46, Jorn Vernee wrote:
> > Just wanted to mention that it's not just offsets, but also
> lengths in
> > some cases.
> >
> > For instance, fill might need a length and offset overload. And
> I've
> > also felt in the past that getUtf8String, which currently only
> takes
> > an offset, should be split into 2 overloads, one that takes an
> offset
> > and length, and one that takes no arguments (which would internally
> > look for a null terminator).
> >
> > Jorn
> >
> > On 14/06/2022 14:40, Maurizio Cimadamore wrote:
> >>
> >> On 14/06/2022 13:03, Alexander Biryukov wrote:
> >>> That makes sense, and I think your description pretty much
> sums it up.
> >>> Maybe it can be transformed in a checklist:
> >>>
> >>> 1. Instance methods are preferred
> >>> 2. Binary operators are implemented as static and instance
> methods,
> >>> both with offsets
> >>> 3. Some instance methods have offset-related overloads, where it
> >>> makes sense
> >>> 4. Pure static methods are not welcome, except for some
> widespread
> >>> patterns, like factory methods (*)
> >>
> >> I would split (3) as follows:
> >>
> >> 3a. Instance methods have an additional offset parameter
> >> 3b. Where it makes sense, offset-free overloads might be provided
> >> (e.g. call method in (3a) with offset=0)
> >>
> >> This feels more principled, as (3a) is a primitive and (3b) is
> sugare
> >> than can easily be derived from that.
> >>
> >> Re. your point of using analytics, again, this is a relatively new
> >> API, there's not much code using it out there, so the risk of
> >> "overfitting" is real.
> >>
> >> I'm slightly on the fence w.r.t. having both static and instance
> >> variants for binary methods - that said, as long as we can clearly
> >> explain what's the primitive (the static method), I think the
> API can
> >> still be explain in a relatively straightforward fashion.
> >>
> >> Maurizio
> >>
> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20220617/65250eb7/attachment-0001.htm>
More information about the panama-dev
mailing list