<html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 17/06/2022 13:49, Alexander Biryukov

      wrote:<br>

    </div>

    <blockquote type="cite" cite="mid:CACRQ6qsJZoFbix2ESsa6EHsn8CLx3p8A2DeLFZnsN6B3zwCsfQ@mail.gmail.com">

      <div dir="ltr">

        <blockquote class="gmail_quote" style="margin:0px 0px 0px

          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

          1. Add a static mismatch method(s) (similar to what we did for

          copy)<br>

          2. add the following overloads:<br>

          * fill(byte, long, long) // D<br>

          3. Tweak the indexed accessors - e.g. rename them to just

          "get" and get <br>

          both an offset (in bytes) parameter as well as an index (in

          elements) <br>

          parameter; then decide what to do with the offset-only

          dereference variant:<br>

          3a) keep as is<br>

          3b) drop<br>

          3c) tweak it so that parameter is an index, not an offset

        </blockquote>

        <div><br>

        </div>

        <div>1. Agree, basic functionality like copy/mismatch must

          expose an offset</div>

        <div>2. Can't see harm here, so I'd add an overload<br>

        </div>

        <div>3. I think the current get/set methods with offsets are

          fine as is (3a).</div>

        <div>I'm not sure if merging them together with setAtIndex is a

          good idea, <br>

        </div>

        <div>something like set(JAVA_INT, 0, 0, 123) looks confusing.</div>

        <div>I expect that "index" means that it starts with 0.</div>

        <div>Otherwise I'd just use set/get and do <span style="font-family:monospace">offset + i * 2/4/8, </span>it's

          much simpler.<br>

        </div>

      </div>

    </blockquote>

    Point taken. Note that option 3c though would give you the same e.g.<br>

    <br>

    set(JAVA_INT, 0, 123) // sets 123 at index 0<br>

    <br>

    and, also have an advanced option:<br>

    <br>

    <p>set(JAVA_INT, offset, 0, 123) // sets 123 at index 0, starting at

      offset</p>

    <p>Doing I *2/4/8 is, of course possible, but looks redundant, given

      that you are also passing a layout. <br>

    </p>

    <p>That said, I don't feel particularly strongly about any of the

      options here - and I agree that the API points provided now are

      expressive enough (assuming one wants to do offset computation by

      hand).<br>

    </p>

    <blockquote type="cite" cite="mid:CACRQ6qsJZoFbix2ESsa6EHsn8CLx3p8A2DeLFZnsN6B3zwCsfQ@mail.gmail.com">

      <div dir="ltr">

        <div> </div>

        <div>By the way, about <span style="font-family:monospace">getUtf8String(long,

            long)</span>.There are multiple ways to put UTF-8 strings in

          byte[], <br>

        </div>

        <div>like null-termination or write length as the first

          byte/int/varint/stored somewhere else (schema).</div>

        <div>Current implementation is a null-terminated string, which

          is fine for C interop.<br>

        </div>

        <div>Perhaps naming could be better to reflect it.<br>

        </div>

      </div>

    </blockquote>

    <p>True. Best we could do is probably dropping a C somewhere in

      there. (e.g. getUtf8CString), as I don't think

      "getUtf8NullTerminatedString" would fly :-)</p>

    <p>Maurizio<br>

    </p>

    <blockquote type="cite" cite="mid:CACRQ6qsJZoFbix2ESsa6EHsn8CLx3p8A2DeLFZnsN6B3zwCsfQ@mail.gmail.com">

      <div dir="ltr">

        <div><br>

        </div>

        <div><br>

        </div>

        <div> </div>

        <br>

        <div class="gmail_quote">

          <div dir="ltr" class="gmail_attr">чт, 16 июн. 2022 г. в 17:01,

            Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>:<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">Hi,<br>

            I've spent some time classifying the various instance

            methods in <br>

            MemorySegment, to see if some pattern emerged:<br>

            <br>

            (A) streaming ops:<br>

            <br>

            * elements(MemoryLayout)<br>

            * spliterator(MemoryLayout)<br>

            <br>

            (B) slicing ops<br>

            <br>

            * asSlice(long)<br>

            * asSlice(long, long)<br>

            <br>

            (C) containment ops<br>

            <br>

            * asOverlappingSlice(MemorySegment)<br>

            * segmentOffset(MemorySegment)<br>

            <br>

            (D) bulk ops<br>

            <br>

            * fill(byte)<br>

            * copyFrom(MemorySegment)<br>

            * mismatch(MemorySegment)<br>

            <br>

            (E) mapped segment ops<br>

            <br>

            * isLoaded()<br>

            * load()<br>

            * unload()<br>

            * force()<br>

            <br>

            (F) transform ops<br>

            <br>

            * asByteBuffer()<br>

            * toArray(ValueLayout.OfT)<br>

            <br>

            (G) simple dereference ops<br>

            <br>

            * get(ValueLayout.OfT, long)<br>

            * set(ValueLayout.OfT, long, T)<br>

            <br>

            (H) indexed dereference ops<br>

            <br>

            getAtIndex(ValueLayout.OfT, long)<br>

            setAtIndex(ValueLayout.OfT, long, T)<br>

            <br>

            (I) string dereferene ops<br>

            <br>

            getUtf8String(long)<br>

            setUtf8String(long, String)<br>

            <br>

            --------------------------------------<br>

            <br>

            Now, let's see if, case by case, we can see whether some

            additional <br>

            overload would be desirable or not. As Jorn pointed out,

            it's not just <br>

            about having an offset overload, in some cases offet+size

            might be better.<br>

            <br>

            First, I think (B), (C) and (G) are ok as is. (B) and (G)

            already <br>

            provides offset/size accepting variants, and (C) is more

            about querying <br>

            whether two segments are related, so adding offsets here

            seems odd.<br>

            <br>

            For (A), (D), (E), (F) you could argue that having an

            overload that <br>

            accepts a size and an offset (or two telescopic overloads,

            one with only <br>

            offset, another with offset _and_ size), might make sense.

            Zooming in a <br>

            bit more, I think some further distinctions are in order.<br>

            <br>

            First, copy already provides many static methods in the

            style of <br>

            System.arrayCopy. So I think it's fair to say that anybody

            who wants <br>

            more control, should use the static method instead. If we

            add a static <br>

            mismatch, I think the same argument might apply there.<br>

            <br>

            Similarly, toArray is a nice convenience method, but it's

            really just a <br>

            shortcut for a bulk copy - again, the static methods are

            enough for <br>

            users who want more control.<br>

            <br>

            Which leaves us with:<br>

            <br>

            * elements(MemoryLayout) // A<br>

            * spliterator(MemoryLayout) // A<br>

            * fill(byte) // D<br>

            * isLoaded() // E<br>

            * load() // E<br>

            * unload() // E<br>

            * force() // E<br>

            * asByteBuffer // F<br>

            <br>

            Adding one or two overloads for both is doable, but feels

            like overkill <br>

            API-wise. I also note that MappedByteBuffer does not provide

            overloads <br>

            for isLoaded/load/unload. Only force() has an overload, that

            was added <br>

            in 13 [1], and, I believe, because ByteBuffer::force relies

            on the <br>

            buffer mutable position (see last section in [2]), thus

            making it <br>

            impossible to flush independent mapped regions in parallel

            using the <br>

            same buffer. MemorySegments do not have that issue, given

            that they are <br>

            immutable - so one can create 20 disjoint slices of a mapped

            segment and <br>

            force() them in parallel.  And, some of these methods

            allocate several <br>

            objects anyway (esp. elements and spliterator), so I don't

            see a lot of <br>

            reasons as to why slicing shouldn't just work there.<br>

            <br>

            So I believe we could probably just add the following

            (although I <br>

            realize this is an arbitrary cut):<br>

            <br>

            * fill(byte, long, long) // D<br>

            <br>

            As for (H), I think the main problem here is that

            dereference methods in <br>

            (H) are not built on top of those in (G). I think the

            following form:<br>

            <br>

            getAtIndex(ValueLayout.OfT, long, long)<br>

            setAtIndex(ValueLayout.OfT, long, long, T)<br>

            <br>

            Would be more useful, where the first parameter is an offset

            into the <br>

            segment, and the second is a logical index. Seems strictly

            more general. <br>

            Actually note that these new methods are general enough

            that, in <br>

            principle, we could only have these, and be done (e.g. just

            one set of <br>

            accessors instead of two). Or, instead of having two sets,

            for offset <br>

            and offset+index, we could have two sets for index and

            offset+index - <br>

            which seems more consistent with the rest of the instance

            methods (where <br>

            the offset variant is the "advanced" one).<br>

            <br>

            Finally, I think (I) the the most complex of all. Because,

            if we add <br>

            something like:<br>

            <br>

            getUtf8String(long, long)<br>

            <br>

            At least to me it's not entirely clear what the "length"

            parameter does. <br>

            Does it means (like in all the other methods discussed so

            far) that it's <br>

            like doing asSlice(offset, length) and then getUtf8String?

            Or does it <br>

            mean (as Jorn seemed to imply) that the length parameter

            refers to the <br>

            string length? Also, what if user specifies a string length,

            but the <br>

            memory region contains a terminator char _before_ the

            user-specified <br>

            length? Honestly, it is not clear to me that what we do in

            (I) is <br>

            clearly lacking, or could be improved in a straightforward

            way. If user <br>

            just wants to view a slice of memory as a string, and ignore

            <br>

            terminators, they can just use bulk copy, to copy the

            contents of the <br>

            segment into a byte array, and then create the string from

            there (which <br>

            is what our impl does anyway) - it's all bytes at the end of

            the day - <br>

            IMHO the interesting thing about the string dereference

            method *is* the <br>

            terminator handling. W/o that it's just a copy.<br>

            <br>

            So, summarizing, to make the API more offset/length friendly

            we could:<br>

            <br>

            1. Add a static mismatch method(s) (similar to what we did

            for copy)<br>

            2. add the following overloads:<br>

            * fill(byte, long, long) // D<br>

            3. Tweak the indexed accessors - e.g. rename them to just

            "get" and get <br>

            both an offset (in bytes) parameter as well as an index (in

            elements) <br>

            parameter; then decide what to do with the offset-only

            dereference variant:<br>

            3a) keep as is<br>

            3b) drop<br>

            3c) tweak it so that parameter is an index, not an offset<br>

            <br>

            In this plan, the most controversial part is (2) - in the

            sense that it <br>

            involves an arbitrary choice based on the uses we've seen on

            the <br>

            ByteBuffer API, as well as some other considerations (e.g.

            how much do <br>

            the methods end up allocating anyway). As for (3), I think

            the more <br>

            general offset+index dereference methods, give us an

            opportunity to <br>

            think if the set of dereference we provide is the one that

            makes the <br>

            most sense.<br>

            <br>

            Thoughts?<br>

            <br>

            Maurizio<br>

            <br>

            [1] - <a href="https://bugs.openjdk.org/browse/JDK-8221696" rel="noreferrer" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://bugs.openjdk.org/browse/JDK-8221696</a><br>

            [2] - <a href="https://openjdk.org/jeps/352" rel="noreferrer" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://openjdk.org/jeps/352</a><br>

            <br>

            On 15/06/2022 14:46, Jorn Vernee wrote:<br>

            > Just wanted to mention that it's not just offsets, but

            also lengths in <br>

            > some cases.<br>

            ><br>

            > For instance, fill might need a length and offset

            overload. And I've <br>

            > also felt in the past that getUtf8String, which

            currently only takes <br>

            > an offset, should be split into 2 overloads, one that

            takes an offset <br>

            > and length, and one that takes no arguments (which

            would internally <br>

            > look for a null terminator).<br>

            ><br>

            > Jorn<br>

            ><br>

            > On 14/06/2022 14:40, Maurizio Cimadamore wrote:<br>

            >><br>

            >> On 14/06/2022 13:03, Alexander Biryukov wrote:<br>

            >>> That makes sense, and I think your description

            pretty much sums it up.<br>

            >>> Maybe it can be transformed in a checklist:<br>

            >>><br>

            >>> 1. Instance methods are preferred<br>

            >>> 2. Binary operators are implemented as static

            and instance methods, <br>

            >>> both with offsets<br>

            >>> 3. Some instance methods have offset-related

            overloads, where it <br>

            >>> makes sense<br>

            >>> 4. Pure static methods are not welcome, except

            for some widespread <br>

            >>> patterns, like factory methods (*)<br>

            >><br>

            >> I would split (3) as follows:<br>

            >><br>

            >> 3a. Instance methods have an additional offset

            parameter<br>

            >> 3b. Where it makes sense, offset-free overloads

            might be provided <br>

            >> (e.g. call method in (3a) with offset=0)<br>

            >><br>

            >> This feels more principled, as (3a) is a primitive

            and (3b) is sugare <br>

            >> than can easily be derived from that.<br>

            >><br>

            >> Re. your point of using analytics, again, this is a

            relatively new <br>

            >> API, there's not much code using it out there, so

            the risk of <br>

            >> "overfitting" is real.<br>

            >><br>

            >> I'm slightly on the fence w.r.t. having both static

            and instance <br>

            >> variants for binary methods - that said, as long as

            we can clearly <br>

            >> explain what's the primitive (the static method), I

            think the API can <br>

            >> still be explain in a relatively straightforward

            fashion.<br>

            >><br>

            >> Maurizio<br>

            >><br>

            >><br>

            >><br>

          </blockquote>

        </div>

      </div>

    </blockquote>

  </body>

</html>