<!DOCTYPE html><html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    <p>> I wouldn't have expected looping with charAt to be

      competitive with the fast paths in StringSupport where the bytes

      are compatible, or is that not right?<br>

      <br>

      The fast paths in StringSupport call an out-of-line stub that does

      a vectorized copy. At least in theory C2's auto-vectorizer should

      be able to do the exact same thing for a manual loop using charAt,

      but inline. i.e. it might even be faster, especially for small

      strings. That's why it would be good to try that approach and see

      how it compares.<br>

      <br>

      > Do you have a feeling for how that approach might best be

      exposed in the API? Do you think it might look like more

      variations of getString/setString in MemorySegment? Or that there

      might be a missing API primitive that could encapsulate those

      String sources and destinations? Or something else?<br>

      <br>

      I was thinking primarily along the lines of adding a

      MemorySegment::copy overload that accepts Strings as a source (as

      opposed to e.g. an array), for copying from a string to a memory

      segment only. We should probably also add an overload to

      SegmentAllocator::allocateFrom that accepts an offset and a length

      (we already have two for full strings). These two overloads could

      fully support the sub string use case without looking too out of

      place.<br>

      <br>

      For reading a String, I think your proposal to augment

      MemorySegment::getString looks good, but I think we should leave

      setString alone in favor of adding a MS::copy overload (there's

      the asymmetry I was talking about before).<br>

      <br>

      For completeness, I think we should also just add the

      MemorySegment::ofString(String, CharSet) overload which tries to

      return a read-only view of the string, to match the existing

      ofArray methods. This seems generally just a good primitive to

      have.</p>

    <p>Jorn</p>

    <div class="moz-cite-prefix">On 4-11-2025 14:13, Liam Miller-Cushon

      wrote:<br>

    </div>

    <blockquote type="cite" cite="mid:CAL4QsgvdeCsCVHoq90uneSbtJctxG74h09cUjKxGa7dqEoB0Hw@mail.gmail.com">

      <div dir="ltr">

        <div>Hi Jorn,</div>

        <div><br>

        </div>

        <div>Thanks for the discussion and input!</div>

        <br>

        <div class="gmail_quote gmail_quote_container">

          <div dir="ltr" class="gmail_attr">On Mon, Nov 3, 2025 at

            7:47 PM Jorn Vernee <<a href="mailto:jorn.vernee@oracle.com" moz-do-not-send="true" class="moz-txt-link-freetext">jorn.vernee@oracle.com</a>>

            wrote:</div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            About the copy elision, this is something that we have seen

            being <br>

            visible in benchmarks [1]. Were there any benchmarks in

            which you've <br>

            seen this difference show up as well, or is this more of a

            theoretical <br>

            benefit? It would be good to understand how important

            performance is in <br>

            all of this, or if it's more about API usability. Also, I

            should note <br>

            that in the case of setString, we only avoid the extra

            byte[] when the <br>

            source and target encodings are compatible.<br>

          </blockquote>

          <div><br>

          </div>

          We had done some earlier benchmarking, I think that was part

          of the discussion that led to JDK-8362893: <a href="https://mail.openjdk.org/pipermail/core-libs-dev/2025-July/149189.html" moz-do-not-send="true" class="moz-txt-link-freetext">https://mail.openjdk.org/pipermail/core-libs-dev/2025-July/149189.html</a>.

          I also made some draft changes to <a href="https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/28043__;!!ACWV5N9M2RV99hQ!Kos8ibRx_a7AZYrGm-cuDJfIZAnnCayLOh3DWGCmkyPh3Dgi5ZtZxbfvytQvz0gUJ6depYeemuNMgtkN$" moz-do-not-send="true">https://github.com/openjdk/jdk/pull/28043</a>

          to add a prototype of setStringWithoutNullTerminator and did

          some more microbenchmarking. I updated the PR description with

          some results.<br>

          <br>

          For use-cases like the protobuf one, the interest is more in

          getting the best possible performance, rather than API

          usability.</div>

        <div class="gmail_quote gmail_quote_container"> 

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">It's

            possible to call `getBytes` on a string and copy the

            resulting byte[] </blockquote>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">into

            the memory segment as well, but I suppose you want to avoid

            that </blockquote>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">because

            of the extra copy of the byte[] (although I think perhaps C2

            can </blockquote>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            elide the extra object in that case)? Have you tried looping

            over the string </blockquote>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            and manually copying each character using charAt?<br>

          </blockquote>

          <div><br>

          </div>

          <div>I'm not seeing competitive performance with the explicit

            call to getBytes in the microbenchmarks, so I wonder if it

            is perhaps not eliding the copy, although I haven't verified

            in the assembly.</div>

          <br>

          I wouldn't have expected looping with charAt to be competitive

          with the fast paths in StringSupport where the bytes are

          compatible, or is that not right?</div>

        <div class="gmail_quote gmail_quote_container"><br>

        </div>

        <div class="gmail_quote gmail_quote_container">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            I think I'm slowly coming to the conclusion that we should

            just treat <br>

            Strings as another source and destination format for data,

            with the <br>

            caveat that we can not modify a String in place, so any read

            operations <br>

            will have to create a new String instance instead. This

            creates some <br>

            asymmetry with the existing MemorySegment::copy methods. I

            think because <br>

            of that restriction, we might have to accept some asymmetry

            between the <br>

            read and write APIs for strings as well.</blockquote>

          <div><br>

          </div>

          Do you have a feeling for how that approach might best be

          exposed in the API? Do you think it might look like more

          variations of getString/setString in MemorySegment? Or that

          there might be a missing API primitive that could encapsulate

          those String sources and destinations? Or something else?<br>

        </div>

      </div>

    </blockquote>

  </body>

</html>