<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 17/06/2022 13:49, Alexander Biryukov
wrote:<br>
</div>
<blockquote type="cite" cite="mid:CACRQ6qsJZoFbix2ESsa6EHsn8CLx3p8A2DeLFZnsN6B3zwCsfQ@mail.gmail.com">
<div dir="ltr">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
1. Add a static mismatch method(s) (similar to what we did for
copy)<br>
2. add the following overloads:<br>
* fill(byte, long, long) // D<br>
3. Tweak the indexed accessors - e.g. rename them to just
"get" and get <br>
both an offset (in bytes) parameter as well as an index (in
elements) <br>
parameter; then decide what to do with the offset-only
dereference variant:<br>
3a) keep as is<br>
3b) drop<br>
3c) tweak it so that parameter is an index, not an offset
</blockquote>
<div><br>
</div>
<div>1. Agree, basic functionality like copy/mismatch must
expose an offset</div>
<div>2. Can't see harm here, so I'd add an overload<br>
</div>
<div>3. I think the current get/set methods with offsets are
fine as is (3a).</div>
<div>I'm not sure if merging them together with setAtIndex is a
good idea, <br>
</div>
<div>something like set(JAVA_INT, 0, 0, 123) looks confusing.</div>
<div>I expect that "index" means that it starts with 0.</div>
<div>Otherwise I'd just use set/get and do <span style="font-family:monospace">offset + i * 2/4/8, </span>it's
much simpler.<br>
</div>
</div>
</blockquote>
Point taken. Note that option 3c though would give you the same e.g.<br>
<br>
set(JAVA_INT, 0, 123) // sets 123 at index 0<br>
<br>
and, also have an advanced option:<br>
<br>
<p>set(JAVA_INT, offset, 0, 123) // sets 123 at index 0, starting at
offset</p>
<p>Doing I *2/4/8 is, of course possible, but looks redundant, given
that you are also passing a layout. <br>
</p>
<p>That said, I don't feel particularly strongly about any of the
options here - and I agree that the API points provided now are
expressive enough (assuming one wants to do offset computation by
hand).<br>
</p>
<blockquote type="cite" cite="mid:CACRQ6qsJZoFbix2ESsa6EHsn8CLx3p8A2DeLFZnsN6B3zwCsfQ@mail.gmail.com">
<div dir="ltr">
<div> </div>
<div>By the way, about <span style="font-family:monospace">getUtf8String(long,
long)</span>.There are multiple ways to put UTF-8 strings in
byte[], <br>
</div>
<div>like null-termination or write length as the first
byte/int/varint/stored somewhere else (schema).</div>
<div>Current implementation is a null-terminated string, which
is fine for C interop.<br>
</div>
<div>Perhaps naming could be better to reflect it.<br>
</div>
</div>
</blockquote>
<p>True. Best we could do is probably dropping a C somewhere in
there. (e.g. getUtf8CString), as I don't think
"getUtf8NullTerminatedString" would fly :-)</p>
<p>Maurizio<br>
</p>
<blockquote type="cite" cite="mid:CACRQ6qsJZoFbix2ESsa6EHsn8CLx3p8A2DeLFZnsN6B3zwCsfQ@mail.gmail.com">
<div dir="ltr">
<div><br>
</div>
<div><br>
</div>
<div> </div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">чт, 16 июн. 2022 г. в 17:01,
Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">Hi,<br>
I've spent some time classifying the various instance
methods in <br>
MemorySegment, to see if some pattern emerged:<br>
<br>
(A) streaming ops:<br>
<br>
* elements(MemoryLayout)<br>
* spliterator(MemoryLayout)<br>
<br>
(B) slicing ops<br>
<br>
* asSlice(long)<br>
* asSlice(long, long)<br>
<br>
(C) containment ops<br>
<br>
* asOverlappingSlice(MemorySegment)<br>
* segmentOffset(MemorySegment)<br>
<br>
(D) bulk ops<br>
<br>
* fill(byte)<br>
* copyFrom(MemorySegment)<br>
* mismatch(MemorySegment)<br>
<br>
(E) mapped segment ops<br>
<br>
* isLoaded()<br>
* load()<br>
* unload()<br>
* force()<br>
<br>
(F) transform ops<br>
<br>
* asByteBuffer()<br>
* toArray(ValueLayout.OfT)<br>
<br>
(G) simple dereference ops<br>
<br>
* get(ValueLayout.OfT, long)<br>
* set(ValueLayout.OfT, long, T)<br>
<br>
(H) indexed dereference ops<br>
<br>
getAtIndex(ValueLayout.OfT, long)<br>
setAtIndex(ValueLayout.OfT, long, T)<br>
<br>
(I) string dereferene ops<br>
<br>
getUtf8String(long)<br>
setUtf8String(long, String)<br>
<br>
--------------------------------------<br>
<br>
Now, let's see if, case by case, we can see whether some
additional <br>
overload would be desirable or not. As Jorn pointed out,
it's not just <br>
about having an offset overload, in some cases offet+size
might be better.<br>
<br>
First, I think (B), (C) and (G) are ok as is. (B) and (G)
already <br>
provides offset/size accepting variants, and (C) is more
about querying <br>
whether two segments are related, so adding offsets here
seems odd.<br>
<br>
For (A), (D), (E), (F) you could argue that having an
overload that <br>
accepts a size and an offset (or two telescopic overloads,
one with only <br>
offset, another with offset _and_ size), might make sense.
Zooming in a <br>
bit more, I think some further distinctions are in order.<br>
<br>
First, copy already provides many static methods in the
style of <br>
System.arrayCopy. So I think it's fair to say that anybody
who wants <br>
more control, should use the static method instead. If we
add a static <br>
mismatch, I think the same argument might apply there.<br>
<br>
Similarly, toArray is a nice convenience method, but it's
really just a <br>
shortcut for a bulk copy - again, the static methods are
enough for <br>
users who want more control.<br>
<br>
Which leaves us with:<br>
<br>
* elements(MemoryLayout) // A<br>
* spliterator(MemoryLayout) // A<br>
* fill(byte) // D<br>
* isLoaded() // E<br>
* load() // E<br>
* unload() // E<br>
* force() // E<br>
* asByteBuffer // F<br>
<br>
Adding one or two overloads for both is doable, but feels
like overkill <br>
API-wise. I also note that MappedByteBuffer does not provide
overloads <br>
for isLoaded/load/unload. Only force() has an overload, that
was added <br>
in 13 [1], and, I believe, because ByteBuffer::force relies
on the <br>
buffer mutable position (see last section in [2]), thus
making it <br>
impossible to flush independent mapped regions in parallel
using the <br>
same buffer. MemorySegments do not have that issue, given
that they are <br>
immutable - so one can create 20 disjoint slices of a mapped
segment and <br>
force() them in parallel. And, some of these methods
allocate several <br>
objects anyway (esp. elements and spliterator), so I don't
see a lot of <br>
reasons as to why slicing shouldn't just work there.<br>
<br>
So I believe we could probably just add the following
(although I <br>
realize this is an arbitrary cut):<br>
<br>
* fill(byte, long, long) // D<br>
<br>
As for (H), I think the main problem here is that
dereference methods in <br>
(H) are not built on top of those in (G). I think the
following form:<br>
<br>
getAtIndex(ValueLayout.OfT, long, long)<br>
setAtIndex(ValueLayout.OfT, long, long, T)<br>
<br>
Would be more useful, where the first parameter is an offset
into the <br>
segment, and the second is a logical index. Seems strictly
more general. <br>
Actually note that these new methods are general enough
that, in <br>
principle, we could only have these, and be done (e.g. just
one set of <br>
accessors instead of two). Or, instead of having two sets,
for offset <br>
and offset+index, we could have two sets for index and
offset+index - <br>
which seems more consistent with the rest of the instance
methods (where <br>
the offset variant is the "advanced" one).<br>
<br>
Finally, I think (I) the the most complex of all. Because,
if we add <br>
something like:<br>
<br>
getUtf8String(long, long)<br>
<br>
At least to me it's not entirely clear what the "length"
parameter does. <br>
Does it means (like in all the other methods discussed so
far) that it's <br>
like doing asSlice(offset, length) and then getUtf8String?
Or does it <br>
mean (as Jorn seemed to imply) that the length parameter
refers to the <br>
string length? Also, what if user specifies a string length,
but the <br>
memory region contains a terminator char _before_ the
user-specified <br>
length? Honestly, it is not clear to me that what we do in
(I) is <br>
clearly lacking, or could be improved in a straightforward
way. If user <br>
just wants to view a slice of memory as a string, and ignore
<br>
terminators, they can just use bulk copy, to copy the
contents of the <br>
segment into a byte array, and then create the string from
there (which <br>
is what our impl does anyway) - it's all bytes at the end of
the day - <br>
IMHO the interesting thing about the string dereference
method *is* the <br>
terminator handling. W/o that it's just a copy.<br>
<br>
So, summarizing, to make the API more offset/length friendly
we could:<br>
<br>
1. Add a static mismatch method(s) (similar to what we did
for copy)<br>
2. add the following overloads:<br>
* fill(byte, long, long) // D<br>
3. Tweak the indexed accessors - e.g. rename them to just
"get" and get <br>
both an offset (in bytes) parameter as well as an index (in
elements) <br>
parameter; then decide what to do with the offset-only
dereference variant:<br>
3a) keep as is<br>
3b) drop<br>
3c) tweak it so that parameter is an index, not an offset<br>
<br>
In this plan, the most controversial part is (2) - in the
sense that it <br>
involves an arbitrary choice based on the uses we've seen on
the <br>
ByteBuffer API, as well as some other considerations (e.g.
how much do <br>
the methods end up allocating anyway). As for (3), I think
the more <br>
general offset+index dereference methods, give us an
opportunity to <br>
think if the set of dereference we provide is the one that
makes the <br>
most sense.<br>
<br>
Thoughts?<br>
<br>
Maurizio<br>
<br>
[1] - <a href="https://bugs.openjdk.org/browse/JDK-8221696" rel="noreferrer" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://bugs.openjdk.org/browse/JDK-8221696</a><br>
[2] - <a href="https://openjdk.org/jeps/352" rel="noreferrer" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://openjdk.org/jeps/352</a><br>
<br>
On 15/06/2022 14:46, Jorn Vernee wrote:<br>
> Just wanted to mention that it's not just offsets, but
also lengths in <br>
> some cases.<br>
><br>
> For instance, fill might need a length and offset
overload. And I've <br>
> also felt in the past that getUtf8String, which
currently only takes <br>
> an offset, should be split into 2 overloads, one that
takes an offset <br>
> and length, and one that takes no arguments (which
would internally <br>
> look for a null terminator).<br>
><br>
> Jorn<br>
><br>
> On 14/06/2022 14:40, Maurizio Cimadamore wrote:<br>
>><br>
>> On 14/06/2022 13:03, Alexander Biryukov wrote:<br>
>>> That makes sense, and I think your description
pretty much sums it up.<br>
>>> Maybe it can be transformed in a checklist:<br>
>>><br>
>>> 1. Instance methods are preferred<br>
>>> 2. Binary operators are implemented as static
and instance methods, <br>
>>> both with offsets<br>
>>> 3. Some instance methods have offset-related
overloads, where it <br>
>>> makes sense<br>
>>> 4. Pure static methods are not welcome, except
for some widespread <br>
>>> patterns, like factory methods (*)<br>
>><br>
>> I would split (3) as follows:<br>
>><br>
>> 3a. Instance methods have an additional offset
parameter<br>
>> 3b. Where it makes sense, offset-free overloads
might be provided <br>
>> (e.g. call method in (3a) with offset=0)<br>
>><br>
>> This feels more principled, as (3a) is a primitive
and (3b) is sugare <br>
>> than can easily be derived from that.<br>
>><br>
>> Re. your point of using analytics, again, this is a
relatively new <br>
>> API, there's not much code using it out there, so
the risk of <br>
>> "overfitting" is real.<br>
>><br>
>> I'm slightly on the fence w.r.t. having both static
and instance <br>
>> variants for binary methods - that said, as long as
we can clearly <br>
>> explain what's the primitive (the static method), I
think the API can <br>
>> still be explain in a relatively straightforward
fashion.<br>
>><br>
>> Maurizio<br>
>><br>
>><br>
>><br>
</blockquote>
</div>
</div>
</blockquote>
</body>
</html>