<div dir="ltr"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

1. Add a static mismatch method(s) (similar to what we did for copy)<br>

2. add the following overloads:<br>

* fill(byte, long, long) // D<br>

3. Tweak the indexed accessors - e.g. rename them to just "get" and get <br>

both an offset (in bytes) parameter as well as an index (in elements) <br>

parameter; then decide what to do with the offset-only dereference variant:<br>

3a) keep as is<br>

3b) drop<br>

3c) tweak it so that parameter is an index, not an offset


</blockquote><div><br></div><div>1. Agree, basic functionality like copy/mismatch must expose an offset</div><div>2. Can't see harm here, so I'd add an overload<br></div><div></div><div>3. I think the current get/set methods with offsets are fine as is (3a).</div><div>I'm not sure if merging them together with setAtIndex is a good idea, <br></div><div>something like set(JAVA_INT, 0, 0, 123) looks confusing.</div><div>I expect that "index" means that it starts with 0.</div><div>Otherwise I'd just use set/get and do <span style="font-family:monospace">offset + i * 2/4/8, </span>it's much simpler.<br></div><div></div><div></div><div> </div><div></div><div>By the way, about <span style="font-family:monospace">getUtf8String(long, long)</span>.There are multiple ways to put UTF-8 strings in byte[], <br></div><div>like null-termination or write length as the first byte/int/varint/stored somewhere else (schema).</div><div>Current implementation is a null-terminated string, which is fine for C interop.<br></div><div>Perhaps naming could be better to reflect it.<br></div><div><br></div><div><br></div><div> </div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">чт, 16 июн. 2022 г. в 17:01, Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank">maurizio.cimadamore@oracle.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>

I've spent some time classifying the various instance methods in <br>

MemorySegment, to see if some pattern emerged:<br>

<br>

(A) streaming ops:<br>

<br>

* elements(MemoryLayout)<br>

* spliterator(MemoryLayout)<br>

<br>

(B) slicing ops<br>

<br>

* asSlice(long)<br>

* asSlice(long, long)<br>

<br>

(C) containment ops<br>

<br>

* asOverlappingSlice(MemorySegment)<br>

* segmentOffset(MemorySegment)<br>

<br>

(D) bulk ops<br>

<br>

* fill(byte)<br>

* copyFrom(MemorySegment)<br>

* mismatch(MemorySegment)<br>

<br>

(E) mapped segment ops<br>

<br>

* isLoaded()<br>

* load()<br>

* unload()<br>

* force()<br>

<br>

(F) transform ops<br>

<br>

* asByteBuffer()<br>

* toArray(ValueLayout.OfT)<br>

<br>

(G) simple dereference ops<br>

<br>

* get(ValueLayout.OfT, long)<br>

* set(ValueLayout.OfT, long, T)<br>

<br>

(H) indexed dereference ops<br>

<br>

getAtIndex(ValueLayout.OfT, long)<br>

setAtIndex(ValueLayout.OfT, long, T)<br>

<br>

(I) string dereferene ops<br>

<br>

getUtf8String(long)<br>

setUtf8String(long, String)<br>

<br>

--------------------------------------<br>

<br>

Now, let's see if, case by case, we can see whether some additional <br>

overload would be desirable or not. As Jorn pointed out, it's not just <br>

about having an offset overload, in some cases offet+size might be better.<br>

<br>

First, I think (B), (C) and (G) are ok as is. (B) and (G) already <br>

provides offset/size accepting variants, and (C) is more about querying <br>

whether two segments are related, so adding offsets here seems odd.<br>

<br>

For (A), (D), (E), (F) you could argue that having an overload that <br>

accepts a size and an offset (or two telescopic overloads, one with only <br>

offset, another with offset _and_ size), might make sense. Zooming in a <br>

bit more, I think some further distinctions are in order.<br>

<br>

First, copy already provides many static methods in the style of <br>

System.arrayCopy. So I think it's fair to say that anybody who wants <br>

more control, should use the static method instead. If we add a static <br>

mismatch, I think the same argument might apply there.<br>

<br>

Similarly, toArray is a nice convenience method, but it's really just a <br>

shortcut for a bulk copy - again, the static methods are enough for <br>

users who want more control.<br>

<br>

Which leaves us with:<br>

<br>

* elements(MemoryLayout) // A<br>

* spliterator(MemoryLayout) // A<br>

* fill(byte) // D<br>

* isLoaded() // E<br>

* load() // E<br>

* unload() // E<br>

* force() // E<br>

* asByteBuffer // F<br>

<br>

Adding one or two overloads for both is doable, but feels like overkill <br>

API-wise. I also note that MappedByteBuffer does not provide overloads <br>

for isLoaded/load/unload. Only force() has an overload, that was added <br>

in 13 [1], and, I believe, because ByteBuffer::force relies on the <br>

buffer mutable position (see last section in [2]), thus making it <br>

impossible to flush independent mapped regions in parallel using the <br>

same buffer. MemorySegments do not have that issue, given that they are <br>

immutable - so one can create 20 disjoint slices of a mapped segment and <br>

force() them in parallel.  And, some of these methods allocate several <br>

objects anyway (esp. elements and spliterator), so I don't see a lot of <br>

reasons as to why slicing shouldn't just work there.<br>

<br>

So I believe we could probably just add the following (although I <br>

realize this is an arbitrary cut):<br>

<br>

* fill(byte, long, long) // D<br>

<br>

As for (H), I think the main problem here is that dereference methods in <br>

(H) are not built on top of those in (G). I think the following form:<br>

<br>

getAtIndex(ValueLayout.OfT, long, long)<br>

setAtIndex(ValueLayout.OfT, long, long, T)<br>

<br>

Would be more useful, where the first parameter is an offset into the <br>

segment, and the second is a logical index. Seems strictly more general. <br>

Actually note that these new methods are general enough that, in <br>

principle, we could only have these, and be done (e.g. just one set of <br>

accessors instead of two). Or, instead of having two sets, for offset <br>

and offset+index, we could have two sets for index and offset+index - <br>

which seems more consistent with the rest of the instance methods (where <br>

the offset variant is the "advanced" one).<br>

<br>

Finally, I think (I) the the most complex of all. Because, if we add <br>

something like:<br>

<br>

getUtf8String(long, long)<br>

<br>

At least to me it's not entirely clear what the "length" parameter does. <br>

Does it means (like in all the other methods discussed so far) that it's <br>

like doing asSlice(offset, length) and then getUtf8String? Or does it <br>

mean (as Jorn seemed to imply) that the length parameter refers to the <br>

string length? Also, what if user specifies a string length, but the <br>

memory region contains a terminator char _before_ the user-specified <br>

length? Honestly, it is not clear to me that what we do in (I) is <br>

clearly lacking, or could be improved in a straightforward way. If user <br>

just wants to view a slice of memory as a string, and ignore <br>

terminators, they can just use bulk copy, to copy the contents of the <br>

segment into a byte array, and then create the string from there (which <br>

is what our impl does anyway) - it's all bytes at the end of the day - <br>

IMHO the interesting thing about the string dereference method *is* the <br>

terminator handling. W/o that it's just a copy.<br>

<br>

So, summarizing, to make the API more offset/length friendly we could:<br>

<br>

1. Add a static mismatch method(s) (similar to what we did for copy)<br>

2. add the following overloads:<br>

* fill(byte, long, long) // D<br>

3. Tweak the indexed accessors - e.g. rename them to just "get" and get <br>

both an offset (in bytes) parameter as well as an index (in elements) <br>

parameter; then decide what to do with the offset-only dereference variant:<br>

3a) keep as is<br>

3b) drop<br>

3c) tweak it so that parameter is an index, not an offset<br>

<br>

In this plan, the most controversial part is (2) - in the sense that it <br>

involves an arbitrary choice based on the uses we've seen on the <br>

ByteBuffer API, as well as some other considerations (e.g. how much do <br>

the methods end up allocating anyway). As for (3), I think the more <br>

general offset+index dereference methods, give us an opportunity to <br>

think if the set of dereference we provide is the one that makes the <br>

most sense.<br>

<br>

Thoughts?<br>

<br>

Maurizio<br>

<br>

[1] - <a href="https://bugs.openjdk.org/browse/JDK-8221696" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8221696</a><br>

[2] - <a href="https://openjdk.org/jeps/352" rel="noreferrer" target="_blank">https://openjdk.org/jeps/352</a><br>

<br>

On 15/06/2022 14:46, Jorn Vernee wrote:<br>

> Just wanted to mention that it's not just offsets, but also lengths in <br>

> some cases.<br>

><br>

> For instance, fill might need a length and offset overload. And I've <br>

> also felt in the past that getUtf8String, which currently only takes <br>

> an offset, should be split into 2 overloads, one that takes an offset <br>

> and length, and one that takes no arguments (which would internally <br>

> look for a null terminator).<br>

><br>

> Jorn<br>

><br>

> On 14/06/2022 14:40, Maurizio Cimadamore wrote:<br>

>><br>

>> On 14/06/2022 13:03, Alexander Biryukov wrote:<br>

>>> That makes sense, and I think your description pretty much sums it up.<br>

>>> Maybe it can be transformed in a checklist:<br>

>>><br>

>>> 1. Instance methods are preferred<br>

>>> 2. Binary operators are implemented as static and instance methods, <br>

>>> both with offsets<br>

>>> 3. Some instance methods have offset-related overloads, where it <br>

>>> makes sense<br>

>>> 4. Pure static methods are not welcome, except for some widespread <br>

>>> patterns, like factory methods (*)<br>

>><br>

>> I would split (3) as follows:<br>

>><br>

>> 3a. Instance methods have an additional offset parameter<br>

>> 3b. Where it makes sense, offset-free overloads might be provided <br>

>> (e.g. call method in (3a) with offset=0)<br>

>><br>

>> This feels more principled, as (3a) is a primitive and (3b) is sugare <br>

>> than can easily be derived from that.<br>

>><br>

>> Re. your point of using analytics, again, this is a relatively new <br>

>> API, there's not much code using it out there, so the risk of <br>

>> "overfitting" is real.<br>

>><br>

>> I'm slightly on the fence w.r.t. having both static and instance <br>

>> variants for binary methods - that said, as long as we can clearly <br>

>> explain what's the primitive (the static method), I think the API can <br>

>> still be explain in a relatively straightforward fashion.<br>

>><br>

>> Maurizio<br>

>><br>

>><br>

>><br>

</blockquote></div></div>